package java.util.regex;
Brief Background
A regular expression consists of a
character string where some characters are given special meaning with
regard to pattern matching. Regular expressions have been in use from
the early days of computing, and provide a powerful and efficient way
to parse, interpret and search and replace text within an application.
Supported Syntax
Within a regular expression, the
following characters have special meaning:
^
matches at the beginning of a line
$
matches at the end of a line
\A
matches the start of the entire string
\b
matches a word boundary
\B
matches a non-word boundary
\G
matches the end of the previous match
\Z
matches the end of the entire string, except for the final terminator, if any
\z
matches the end of the entire string
.
matches any single character (may or may not match line terminators)
\\
matches a backslash character
\0n
matches the character with octal value 0n (0 <= n <= 7)
\0nn
matches the character with octal value 0nn (0 <= n <= 7)
\0mnn
matches the character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)
\a
matches an alert (bell) character ('\u0007')
\cx
matches the control character corresponding to x
\d
matches any decimal digit: [0-9]
\D
matches any non-digit: [^0-9]
\e
matches an escape character ('\u001B')
\f
matches a form-feed character ('\u000C')
\n
matches a newline (line feed) character ('\u000A')
\r
matches a return character ('\u000D')
\s
matches any whitespace character: [ \t\n\x0B\f\r]
\S
matches any non-whitespace character: [^\s]
\t
matches a horizontal tab character ('\u0009')
\w
matches any word (alphanumeric) character: [a-zA-Z_0-9]
\W
matches any non-word (alphanumeric) character: [^\w]
\x
matches the character x, if x is not one of the above listed escape sequences.
\xhh
matches the character with hexadecimal value 0xhh
\uhhhh
matches the character with hexadecimal value 0xhhhh
[abc]
matches any character in the set a, b or c
[^abc]
matches any character not in the set a, b or c
[a-zA-Z]
matches any character in the range a through z or A through Z (range)
[a-d[m-p]]
matches any character in the range a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]]
matches any character in the range d, e, or f (intersection)
[a-z&&[^bc]]
matches any character in the range a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]]
matches any character in the range a through z, and not m through p: [a-lq-z](subtraction)
A leading or trailing dash will be interpreted literally.
\p{Lower}
matches a lower-case alphabetic character: [a-z]
\p{Upper}
matches an upper-case alphabetic character:[A-Z]
\p{ASCII}
matches all ASCII:[\x00-\x7F]
\p{Alpha}
matches an alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit}
matches a decimal digit: [0-9]
\p{Alnum}
matches an alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct}
matches punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
\p{Graph}
matches a visible character: [\p{Alnum}\p{Punct}]
\p{Print}
matches a printable character: [\p{Graph}\x20]
\p{Blank}
matches a space or a tab: [ \t]
\p{Cntrl}
matches a control character: [\x00-\x1F\x7F]
\p{XDigit}
matches a hexadecimal digit: [0-9a-fA-F]
\p{Space}
matches a whitespace character: [ \t\n\x0B\f\r]
\p{javaLowerCase}
Equivalent to java.lang.Character.isLowerCase()
\p{javaUpperCase}
Equivalent to java.lang.Character.isUpperCase()
\p{javaWhitespace}
Equivalent to java.lang.Character.isWhitespace()
\p{javaMirrored}
Equivalent to java.lang.Character.isMirrored()
\p{InGreek}
A character in the Greek block (simple block)
\p{Lu}
An uppercase letter (simple category)
\p{Sc}
A currency symbol
\P{InGreek}
Any character except one in the Greek block (negation)
[\p{L}&&[^\p{Lu}]]
Any letter except an uppercase letter (subtraction)
X?
X, once or not at all
X*
X, zero or more times
X+
X, one or more times
X{n}
X, exactly n times
X{n,}
X, at least n times
X{n,m}
X, at least n but not more than m times
X??
X, once or not at all
X*?
X, zero or more times
X+?
X, one or more times
X{n}?
X, exactly n times
X{n,}?
X, at least n times
X{n,m}?
X, at least n but not more than m times
X?+
X, once or not at all
X*+
X, zero or more times
X++
X, one or more times
X{n}+
X, exactly n times
X{n,}+
X, at least n times
X{n,m}+
X, at least n but not more than m times
XY
X followed by Y
X|Y
Either X or Y
(X)
X, as a capturing group
\n
Whatever the nth capturing group matched
\
Nothing, but quotes the following character
\Q
Nothing, but quotes all characters until \E
\E
Nothing, but ends quoting started by \Q
(?:X)
X, as a non-capturing group
(?idmsux-idmsux)
Nothing, but turns match flags on - off
(?idmsux-idmsux:X)
X, as a non-capturing group with the given flags on - off
(?=X)
X, via zero-width positive lookahead
(?!X)
X, via zero-width negative lookahead
(?<=X)
X, via zero-width positive lookbehind
(?<!X)
X, via zero-width negative lookbehind
(?>X)
X, as an independent, non-capturing group