Advanced Regular Expressions
Useful patterns
Pattern | Description | REGEX | Sample match | Sample
not match |
---|---|---|---|---|
\d | Digit.
Matches any digit. Equivalent with [0-9]. |
\d\d\d | 123 | 1-3 |
\D | Non digit.
Matches any character that is not a digit. |
\d\D\d | 1-3 | 123 |
\w | Word.
Matches any alphanumeric character and underscore. Equivalent with [a-zA-Z0-9_]. |
\w\w\w | a_A | a-A |
\W | Not Word.
Matches any character that is not word character (alphanumeric character and underscore). |
\W\W\W | +-$ | +_@ |
\s | Whitespace.
Matches any whitespace character (space, tab, line breaks). |
\d\s\w | 1 a | 1ab |
\S | Not Whitespace.
Matches any character that is not a whitespace character (space, tab, line breaks). |
\w\w\w\w\S\d | Test#1 | test 1 |
\b | Word boundaries.
Can be used to match a complete word. Word boundaries are the boundaries between a word and a non-word character. |
\bis\b | is; | This
island: |
{} | The curly braces {…}.
It tells the computer to repeat the preceding character (or set of characters) for as many times as the value inside this bracket. {min,} means the preceding character is matches min times or more. {min,max} means that the preceding character is repeated at least min and at most max times. |
abc{2} |
abcc |
abc |
.* | Matches any character (except for line terminators), matches between zero and unlimited times. | .* |
abbb Empty string |
|
.+ | Matches any character (except for line terminators), matches between one and unlimited times. | .+ | a
abbcc |
Empty string |
^ | Anchor ^.The start of the line.
Matches position just before the first character of the string. |
^The\s\w+ | The contest | One contest |
$ | Anchor $. The end of the line.
Matches position just after the last character of the string. |
\d{4}\sACSL$ | 2020 ACSL | 2020 STAR |
\ | Escape a special character.
If you want to use any of the metacharacters as a literal in a regex, you need to escape them with a backslash, like: \. \* \+ \[ etc. |
\w\w\w\. | cat. | lion |
() | Groups.
Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characthers and capturing them using the parentheses (). |
^(file.+)\.docx$ | file_graphs.docx
file_lisp.docx |
data.docx |
\number | Backreference.
A set of different symbols of a regular expression can be grouped together to act as a single unit and behave as a block. \n means that the group enclosed within the n-th bracket will be repeated at current position. |
|||
\1 | Contents of Group 1. | r(\w)g\1x | regex
Group \1 is e |
regxx |
\2 | Contents of Group 2. | (\d\d)\+(\d\d)=\2\+\1 | 20+21=21+20
Group \1 is 20 Group \2 is 21 |
20+21=20+21 |