Regular Expressions

# Regular Expressions #

A Regular Expression (`regexp` for short) is a sequence of characters that specifies a search pattern. Such patterns are used in the Filter Transformations to search for similar string values.

Every character in Regular Expressions is either a metacharacter, having a special meaning, or a regular character that has a literal meaning. For example, a `regexp` in the form of `.*gr[ae]y.*` has metacharacters (`.`, `*`, `[]`) and literal characters (`g`, `r`, `a`, `e`, `y`). This `regexp` will search only strings that have the word `gray` or `grey` (e.g. `small gray boxes`, `grey doors` or `stingrays`).

## Metacharacters #

Below you can see what all metacharacters mean with some examples. If you want to check an in-depth tutorial on Regular Expressions, you can visit the Python `re` library documentation.

If you want to test your `regexp`, visit this free online tool

### Dot #

`.` (Dot) matches any character.

reqexp = `p.t`

matches: `pat`, `pbt`, `pAt`, `p4t`, `p\$t`, etc.

### Asterisk #

`*` (Asterisk) causes the preceding `regexp` or character to match 0 or more repetitions of that `regexp` or character.

reqexp = `mo*re`

matches: `mre`, `more`, `moore`, `mooore`, etc.

reqexp = `.*gray`

matches: `gray`, `stingray`, `\$7!Ngray`, `dark gray`, etc.

### Caret #

`^` (Caret) matches the start of the string.

reqexp = `^gray.*`

matches: `gray` or `gray door`

does not match: `stingray` or `small gray box`

### Dollar #

`\$` (Dollar) matches the end of the string.

reqexp = `gray\$`

matches: `gray`, `stingray` or `dark gray`

does not match: `small gray box` or `gray door`

### Plus #

`+` (Plus) causes the preceding `regexp` or character to match at least 1 repetition of that `regexp` or character.

reqexp = `mo+re`

matches: `more`, `moore`, `mooore`, etc.

does not match: `mre`

reqexp = `.+gray`

matches: `stingray`, `\$7!Ngray`, `dark gray`, etc.

does not match: `gray`

### Question mark #

`?` (Question mark) causes the preceding `regexp` or character to match 0 or 1 repetition of that `regexp` or character.

reqexp = `mo?re`

matches: `mre`, `more`

does not match: `moore`, `mooore`, etc.

reqexp = `.?gray`

matches: `gray`, `Ngray`, ` gray`, `_gray`, `4gray`, `%gray` etc.

does not match: `stingray`, `small gray box`, `dark gray`, etc.

The `?` metacharacter also limits the number of repetitions or the preceding `regexp` to the least possible number.

reqexp = `gr.*y`

matches: `grey`, `gray`, `groovy`, `gr^y`, `gray stingray`, `grey stingray`, etc.

reqexp = `gr.*?y`

matches: `grey`, `gray`, `groovy`, `gr^y`, etc.

does not match: `gray stingray`, `grey stingray`

### Braces #

`{m}` (Braces or Curly brackets) causes the preceding `regexp` or character to match exactly `m` repetitions.

reqexp = `.{2}vy`

matches: any 4 character long string ending with `vy` (e.g. `navy`, `levy`, `envy`, `wavy`, `bevy`, `cavy`, `davy`, `jivy`, `tivy`, `12vy`, etc.)

does not match: strings ending with `vy` that are longer or shorter than 4 characters (e.g. `groovy`, `ivy`, `heavy`, `gravy`, `anchovy`, `scurvy`, etc.)

You can also specify a second number inside the Braces (`{m,n}`) causing the preceding `regexp` or character to match exactly from `m` to `n` repetitions.

reqexp = `.{1,3}vy`

matches: any 3 to 5 character long string ending with `vy` (e.g. `navy`, `ivy`, `gravy`, etc.)

does not match: strings ending with `vy` that are longer than 5 characters (e.g. `groovy`, `anchovy`, `scurvy`, etc.)

### Square brackets #

`[]` (Square brackets) is used to indicate a set of characters. In a set:

• characters can be listed individually,

reqexp = `[chf]at`

matches: `cat`, `hat`, `fat`

does not match: `pat`, `mat`, `bat`, `sat`, `rat`, etc.

• ranges of characters can be indicated by giving two characters and separating them by a ‘-’, e.g. `[a-z]`,

reqexp = `[b-f]at`

matches: `bat`, `cat`, `dat`, `eat`, `fat`

does not match: `aat`, `mat`, `sat`, `rat`, etc.

• special characters lose their special meaning inside sets and are treated as literal characters,

reqexp = `[.*{2}]`

matches: `.`, `*`, `{`, `}` and `2`

• to match a literal `]` inside a set, precede it with a backslash `\`, or place it at the beginning of the set,

reqexp = `[()[\]{}]` or `[]()[{}]`

matches: `[`, `]`, `{`, `}`, `(` and `)`

• the Caret ^ will exclude the characters from a set.

reqexp = `[^chf]at`

matches: `pat`, `mat`, `bat`, `sat`, `rat`, etc.

does not match: `cat`, `hat`, `fat`

### Pipe #

`|` (Pipe) creates a `regexp` that matches either of the characters the `|` is between.

reqexp = `gray|grey`

matches: `gray` or `grey`

### Brackets #

`()` (Brackets) match whatever `regexp` is inside the parentheses, and indicates the start and end of a group. The contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the `\number` special sequence. Inserting a question mark `?` at the beginning of a group creates an extension notation. The first character after the `?` determines what the meaning and further syntax of the construct is. Below you can see the most common usage.

Ignore character case `(?i)`

reqexp = `(?i)Newton`

matches: `Newton`, `newton`, `NeWtOn`

reqexp = `Isaac (?=Newton)`

matches: `Isaac Newton`

does not match: `Isaac Newton`, `Isaac Hanson`, `Isaac Asimov`, `Isaac `

reqexp = `Isaac (?!Newton)`

matches: `Isaac `

does not match: `Isaac Newton`, `Isaac Hanson`, `Isaac Asimov`

More advanced usage of the `()` metacharacter can be found here.

### Backslash #

`\` (Backslash) either escapes special characters (permitting you to match characters like `*`, `?`, etc.), or signals a special sequence.

reqexp = `2\*2=4`

matches: `2*2=4`

does not match: `22*2=44`

reqexp = `2*2=4`

matches: `22=4`, `2=4`, `2222222=4`

does not match: `2*2=4`, `22*2=44`

## Special Squences #

You can use the Backslash to create a Special sequence

### Backslash Number #

`\number` (Backslash Number) matches the contents of the group of the same number. Groups are numbered starting from 1. For example, `(.+) \1` matches `the the` or `55 55`, but not `thethe` (note the space after the group). This special sequence can only be used to match one of the first 99 groups.

### Backslash capital A #

`\A` (Backslash capital A) matches the start of the string (similar to Caret).

reqexp = `\Agray.*`

matches: `gray` or `gray door`

does not match: `stingray` or `small gray box`

### Backslash small b #

`\b` (Backslash small b) matches the empty string, but only at the beginning or end of a word.

reqexp = `\bgray\b`

matches: `gray` between words or brackets like `small gray box`

does not match: `stingray`, `grayish`, `3gray3`

### Backslash capital B #

`\B` (Backslash capital B) matches the empty string, but only when it is not at the beginning or end of a word.

reqexp = `\Bray\B`

matches: `ray` that is inside a word (like in `portraying`, `hairsprays`, `arrays`, etc.)

does not match: `stingray`, `grayish`, `disarray`, etc.

### Backslash small d #

`\d` (Backslash small d) matches characters that are decimal digits (similar to `[0-9]*`).

reqexp = `\d*.*`

matches: any string that starts with a digit (e.g. `2 rays`, `3 lemons`, `99 problems`, etc.)

### Backslash capital D #

`\D` (Backslash capital D) matches characters that are not decimal digits (similar to `[^0-9]*`).

reqexp = `\D*`

matches: any string that does not contain digits.

does not match: `2 rays`, `3 lemons`, `99 problems`, etc.

### Backslash small s #

`\s` (Backslash small s) matches Unicode whitespace characters.

reqexp = `\s*`

matches: ` `, ` `, `\n`, etc.

### Backslash capital S #

`\s` (Backslash capital S) matches only characters that are not Unicode whitespace characters.

reqexp = `\S*`

matches: `gray`, `stingray`, `grayish`

does not match: `small gray door`,

### Backslash small w #

`\w` (Backslash small w) matches only alphanumeric characters.

reqexp = `\w*`

matches: `gray`, `stingray`, `grayish`

does not match: `small gray door`,

### Backslash capital W #

`\W` (Backslash capital W) matches characters that are not alphanumeric characters.

reqexp = `\W*`

matches: `gray`, `stingray`, `grayish`

does not match: `small gray door`,

### Backslash capital Z #

`\Z` (Backslash capital Z) matches the end of the string.

reqexp = `gray\Z`

matches: `gray`, `stingray` or `dark gray`

does not match: `small gray box` or `gray door`