Regular Expression Syntax
	package java.util.regex;
	
	
	
Regular Expression Syntax
This page was last updated on 9 April 2009 and much of the content is from Sun's Java 2 Platform SE 5.0. See Java's Pattern Class for more details on regular expressions and their usage.
		Brief Background 
 
		
A regular expression consists of a character string where some characters are given special meaning with regard to pattern matching. Regular expressions have been in use from the early days of computing, and provide a powerful and efficient way to parse, interpret and search and replace text within an application.
		Supported Syntax 
 Within a regular expression, the
		following characters have special meaning:
	
- Boundary Operators
		
^matches at the beginning of a line
$matches at the end of a line
\Amatches the start of the entire string
\bmatches a word boundary
\Bmatches a non-word boundary
\Gmatches the end of the previous match
\Zmatches the end of the entire string, except for the final terminator, if any
\zmatches the end of the entire string
 - One-Character Operators
		
.matches any single character (may or may not match line terminators)
\\matches a backslash character
\0nmatches the character with octal value 0n (0 <= n <= 7)
\0nnmatches the character with octal value 0nn (0 <= n <= 7)
\0mnnmatches the character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\amatches an alert (bell) character ('\u0007')
\cxmatches the control character corresponding to x
\dmatches any decimal digit: [0-9]
\Dmatches any non-digit: [^0-9]
\ematches an escape character ('\u001B')
\fmatches a form-feed character ('\u000C')
\nmatches a newline (line feed) character ('\u000A')
\rmatches a return character ('\u000D')
\smatches any whitespace character: [ \t\n\x0B\f\r]
\Smatches any non-whitespace character: [^\s]
\tmatches a horizontal tab character ('\u0009')
\wmatches any word (alphanumeric) character: [a-zA-Z_0-9]
\Wmatches any non-word (alphanumeric) character: [^\w]
\xmatches the character x, if x is not one of the above listed escape sequences.
\xhhmatches the character with hexadecimal value 0xhh
\uhhhhmatches the character with hexadecimal value 0xhhhh
 - Character Class Operator
		
[abc]matches any character in the set a, b or c
[^abc]matches any character not in the set a, b or c
[a-zA-Z]matches any character in the range a through z or A through Z (range)
[a-d[m-p]]matches any character in the range a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]]matches any character in the range d, e, or f (intersection)
[a-z&&[^bc]]matches any character in the range a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]]matches any character in the range a through z, and not m through p: [a-lq-z](subtraction)
A leading or trailing dash will be interpreted literally.
 - POSIX character classes (US-ASCII only)
		
\p{Lower}matches a lower-case alphabetic character: [a-z]
\p{Upper}matches an upper-case alphabetic character:[A-Z]
\p{ASCII}matches all ASCII:[\x00-\x7F]
\p{Alpha}matches an alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit}matches a decimal digit: [0-9]
\p{Alnum}matches an alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct}matches punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
\p{Graph}matches a visible character: [\p{Alnum}\p{Punct}]
\p{Print}matches a printable character: [\p{Graph}\x20]
\p{Blank}matches a space or a tab: [ \t]
\p{Cntrl}matches a control character: [\x00-\x1F\x7F]
\p{XDigit}matches a hexadecimal digit: [0-9a-fA-F]
\p{Space}matches a whitespace character: [ \t\n\x0B\f\r]
 - java.lang.Character classes (simple java
					character type)
		
\p{javaLowerCase}Equivalent to java.lang.Character.isLowerCase()
\p{javaUpperCase}Equivalent to java.lang.Character.isUpperCase()
\p{javaWhitespace}Equivalent to java.lang.Character.isWhitespace()
\p{javaMirrored}Equivalent to java.lang.Character.isMirrored()
 - Classes for Unicode blocks and categories
		
\p{InGreek}A character in the Greek block (simple block)
\p{Lu}An uppercase letter (simple category)
\p{Sc}A currency symbol
\P{InGreek}Any character except one in the Greek block (negation)
[\p{L}&&[^\p{Lu}]]Any letter except an uppercase letter (subtraction)
 - Greedy quantifiers
		
These quantifiers continue to match as much as possible, even when stopping would allow the overall match to succeed.X?X, once or not at all
X*X, zero or more times
X+X, one or more times
X{n}X, exactly n times
X{n,}X, at least n times
X{n,m}X, at least n but not more than m times
 - Reluctant quantifiers
		
These quantifiers will stop matching, if doing so will allow the overall match to succeed.X??X, once or not at all
X*?X, zero or more times
X+?X, one or more times
X{n}?X, exactly n times
X{n,}?X, at least n times
X{n,m}?X, at least n but not more than m times
 - Possessive quantifiers
		
X?+X, once or not at all
X*+X, zero or more times
X++X, one or more times
X{n}+X, exactly n times
X{n,}+X, at least n times
X{n,m}+X, at least n but not more than m times
 - Logical operators
		
XYX followed by Y
X|YEither X or Y
(X)X, as a capturing group
 - Back references
		
\nWhatever the nth capturing group matched
 - Quotation
		
\Nothing, but quotes the following character
\QNothing, but quotes all characters until \E
\ENothing, but ends quoting started by \Q
 - Special constructs (non-capturing)
		
(?:X)X, as a non-capturing group
(?idmsux-idmsux)Nothing, but turns match flags on - off
(?idmsux-idmsux:X)X, as a non-capturing group with the given flags on - off
(?=X)X, via zero-width positive lookahead
(?!X)X, via zero-width negative lookahead
(?<=X)X, via zero-width positive lookbehind
(?<!X)X, via zero-width negative lookbehind
(?>X)X, as an independent, non-capturing group