package java.util.regex;
Brief Background
A regular expression consists of a
character string where some characters are given special meaning with
regard to pattern matching. Regular expressions have been in use from
the early days of computing, and provide a powerful and efficient way
to parse, interpret and search and replace text within an application.
Supported Syntax
Within a regular expression, the
following characters have special meaning:
^matches at the beginning of a line
$matches at the end of a line
\Amatches the start of the entire string
\bmatches a word boundary
\Bmatches a non-word boundary
\Gmatches the end of the previous match
\Zmatches the end of the entire string, except for the final terminator, if any
\zmatches the end of the entire string
.matches any single character (may or may not match line terminators)
\\matches a backslash character
\0nmatches the character with octal value 0n (0 <= n <= 7)
\0nnmatches the character with octal value 0nn (0 <= n <= 7)
\0mnnmatches the character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\amatches an alert (bell) character ('\u0007')
\cxmatches the control character corresponding to x
\dmatches any decimal digit: [0-9]
\Dmatches any non-digit: [^0-9]
\ematches an escape character ('\u001B')
\fmatches a form-feed character ('\u000C')
\nmatches a newline (line feed) character ('\u000A')
\rmatches a return character ('\u000D')
\smatches any whitespace character: [ \t\n\x0B\f\r]
\Smatches any non-whitespace character: [^\s]
\tmatches a horizontal tab character ('\u0009')
\wmatches any word (alphanumeric) character: [a-zA-Z_0-9]
\Wmatches any non-word (alphanumeric) character: [^\w]
\xmatches the character x, if x is not one of the above listed escape sequences.
\xhhmatches the character with hexadecimal value 0xhh
\uhhhhmatches the character with hexadecimal value 0xhhhh
[abc]matches any character in the set a, b or c
[^abc]matches any character not in the set a, b or c
[a-zA-Z]matches any character in the range a through z or A through Z (range)
[a-d[m-p]]matches any character in the range a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]]matches any character in the range d, e, or f (intersection)
[a-z&&[^bc]]matches any character in the range a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]]matches any character in the range a through z, and not m through p: [a-lq-z](subtraction)
A leading or trailing dash will be interpreted literally.
\p{Lower}matches a lower-case alphabetic character: [a-z]
\p{Upper}matches an upper-case alphabetic character:[A-Z]
\p{ASCII}matches all ASCII:[\x00-\x7F]
\p{Alpha}matches an alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit}matches a decimal digit: [0-9]
\p{Alnum}matches an alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct}matches punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
\p{Graph}matches a visible character: [\p{Alnum}\p{Punct}]
\p{Print}matches a printable character: [\p{Graph}\x20]
\p{Blank}matches a space or a tab: [ \t]
\p{Cntrl}matches a control character: [\x00-\x1F\x7F]
\p{XDigit}matches a hexadecimal digit: [0-9a-fA-F]
\p{Space}matches a whitespace character: [ \t\n\x0B\f\r]
\p{javaLowerCase}Equivalent to java.lang.Character.isLowerCase()
\p{javaUpperCase}Equivalent to java.lang.Character.isUpperCase()
\p{javaWhitespace}Equivalent to java.lang.Character.isWhitespace()
\p{javaMirrored}Equivalent to java.lang.Character.isMirrored()
\p{InGreek}A character in the Greek block (simple block)
\p{Lu}An uppercase letter (simple category)
\p{Sc}A currency symbol
\P{InGreek}Any character except one in the Greek block (negation)
[\p{L}&&[^\p{Lu}]]Any letter except an uppercase letter (subtraction)
X?X, once or not at all
X*X, zero or more times
X+X, one or more times
X{n}X, exactly n times
X{n,}X, at least n times
X{n,m}X, at least n but not more than m times
X??X, once or not at all
X*?X, zero or more times
X+?X, one or more times
X{n}?X, exactly n times
X{n,}?X, at least n times
X{n,m}?X, at least n but not more than m times
X?+X, once or not at all
X*+X, zero or more times
X++X, one or more times
X{n}+X, exactly n times
X{n,}+X, at least n times
X{n,m}+X, at least n but not more than m times
XYX followed by Y
X|YEither X or Y
(X)X, as a capturing group
\nWhatever the nth capturing group matched
\Nothing, but quotes the following character
\QNothing, but quotes all characters until \E
\ENothing, but ends quoting started by \Q
(?:X)X, as a non-capturing group
(?idmsux-idmsux)Nothing, but turns match flags on - off
(?idmsux-idmsux:X)X, as a non-capturing group with the given flags on - off
(?=X)X, via zero-width positive lookahead
(?!X)X, via zero-width negative lookahead
(?<=X)X, via zero-width positive lookbehind
(?<!X)X, via zero-width negative lookbehind
(?>X)X, as an independent, non-capturing group