# Regular Expressions

> *Regular expressions* are powerful tools for pattern matching and text processing. They are represented as a pattern that consists of a special set of characters to search for in a string `str`. The regex module needs to be imported before use.

This page provides syntax for regular expressions in Python . Each section includes an example to demonstrate the described methods.

### Functions <a href="#functions" id="functions"></a>

<table><thead><tr><th width="348">Action</th><th>Function</th></tr></thead><tbody><tr><td>Check if regex matches a string</td><td><code>re.search("pattern", string, flag=0)</code></td></tr><tr><td>Capture regex matches</td><td><code>re.match("pattern", string, flag=0)</code></td></tr><tr><td>Specify alternative regex</td><td><code>pattern1|pattern2</code></td></tr></tbody></table>

### Character Class <a href="#character_class" id="character_class"></a>

*Character class* specifies a list of characters to match (`[...]` where `...` represents the list) or not match (`[^...]`)

<table data-header-hidden><thead><tr><th width="378"></th><th></th></tr></thead><tbody><tr><td>Character Class</td><td><code>...</code></td></tr><tr><td>Any lowercase vowel</td><td><code>[aeiou]</code></td></tr><tr><td>Any digit</td><td><code>[0-9]</code></td></tr><tr><td>Any lowercase letter</td><td><code>[a-z]</code></td></tr><tr><td>Any uppercase letter</td><td><code>[A-Z]</code></td></tr><tr><td>Any digit, lowercase letter, or uppercase letter</td><td><code>[a-zA-Z0-9]</code></td></tr><tr><td>Anything except a lowercase vowel</td><td><code>[^aeiou]</code></td></tr><tr><td>Anything except a digit</td><td><code>[^0-9]</code></td></tr><tr><td>Anything except a space</td><td><code>[^ ]</code></td></tr><tr><td>Any character</td><td><code>.</code></td></tr><tr><td>Any word character (equivalent to <code>[a-zA-Z0-9_]</code>)</td><td><code>\w</code></td></tr><tr><td>Any non-word character (equivalent to <code>[^a-zA-Z0-9_]</code>)</td><td><code>W</code></td></tr><tr><td>A digit character (equivalent to <code>[0-9]</code>)</td><td><code>\d</code></td></tr><tr><td>Any non-digit character (equivalent to <code>[^0-9]</code>)</td><td><code>\D</code></td></tr><tr><td>Any whitespace character (equivalent to <code>[\t\r\n\f]</code>)</td><td><code>\s</code></td></tr><tr><td>Any non-whitespace character (equivalent to <code>[^\t\r\n\f]</code>)</td><td><code>\S</code></td></tr></tbody></table>

### Anchors <a href="#anchors" id="anchors"></a>

*Anchors* are special characters that can be used to match a pattern at a specified position

| Anchor              | Special Character |
| ------------------- | ----------------- |
| Beginning of line   | `^`               |
| End of line         | `$`               |
| Beginning of string | `\A`              |
| End of string       | `\Z`              |

### Repetition and Quantifier Characters <a href="#repetition_and_quantifier_characters" id="repetition_and_quantifier_characters"></a>

*Repetition or quantifier characters* specify the number of times to match a particular character or set of characters

| Repetition                     | Character |
| ------------------------------ | --------- |
| Zero or more times             | `*`       |
| One or more times              | `+`       |
| Zero or one time               | `?`       |
| Exactly n times                | `{n}`     |
| n or more times                | `{n,}`    |
| m or less times                | `{,m}`    |
| At least n and at most m times | `{n.m}`   |

Input:

```julia
# regex.jl
number1 = "(555)123-4567"
number2 = "123-45-6789"

# check if matches
if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number1)
   println("match!")
end

if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number2)
  println("match!")
else
  println("no match!")
end

# capture matches
# use parentheses to "capture" different parts of a regular 
# expression for later use the first set of parentheses corresponds 
# to index 1, second to index 2, etc.

number_details = match(r"\(([0-9]{3})\)([0-9]{3}-[0-9]{4})", number1)

if number_details != nothing
   area_code = number_details[1]
   phone_number = number_details[2]

   println("area code: $area_code")
   println("phone number: $phone_number")
end
```

Output:

```julia
match!
no match!
area code: 555
phone number: 123-4567
```

## Resources <a href="#documentation" id="documentation"></a>

* [Regular Expressions 101](https://regex101.com/)
* [Regular Expressions Library](http://www.regexlib.com/)
* [Regular Expressions Cheat Sheet](http://www.regexlib.com/CheatSheet.aspx)
* W3 Schools: [Python RegEx](https://www.w3schools.com/python/python_regex.asp)
