# Regular Expressions

*<mark style="color:$primary;">Regular expressions</mark>* <mark style="color:$primary;"></mark><mark style="color:$primary;">are powerful tools for pattern matching and text processing. They are represented as a pattern that consists of a special set of characters to search for in a string</mark> <mark style="color:$primary;"></mark><mark style="color:$primary;">`str`</mark><mark style="color:$primary;">. The regex module needs to be imported before use.</mark>

This page provides syntax for regular expressions in Python . Each section includes an example to demonstrate the described methods.

### Functions <a href="#functions" id="functions"></a>

<table><thead><tr><th width="348">Action</th><th>Function</th></tr></thead><tbody><tr><td>Check if regex matches a string</td><td><code>re.search("pattern", string, flag=0)</code></td></tr><tr><td>Capture regex matches</td><td><code>re.match("pattern", string, flag=0)</code></td></tr><tr><td>Specify alternative regex</td><td><code>pattern1|pattern2</code></td></tr></tbody></table>

### Character Class <a href="#character_class" id="character_class"></a>

*Character class* specifies a list of characters to match (`[...]` where `...` represents the list) or not match (`[^...]`)

<table data-header-hidden><thead><tr><th width="378"></th><th></th></tr></thead><tbody><tr><td>Character Class</td><td><code>...</code></td></tr><tr><td>Any lowercase vowel</td><td><code>[aeiou]</code></td></tr><tr><td>Any digit</td><td><code>[0-9]</code></td></tr><tr><td>Any lowercase letter</td><td><code>[a-z]</code></td></tr><tr><td>Any uppercase letter</td><td><code>[A-Z]</code></td></tr><tr><td>Any digit, lowercase letter, or uppercase letter</td><td><code>[a-zA-Z0-9]</code></td></tr><tr><td>Anything except a lowercase vowel</td><td><code>[^aeiou]</code></td></tr><tr><td>Anything except a digit</td><td><code>[^0-9]</code></td></tr><tr><td>Anything except a space</td><td><code>[^ ]</code></td></tr><tr><td>Any character</td><td><code>.</code></td></tr><tr><td>Any word character (equivalent to <code>[a-zA-Z0-9_]</code>)</td><td><code>\w</code></td></tr><tr><td>Any non-word character (equivalent to <code>[^a-zA-Z0-9_]</code>)</td><td><code>W</code></td></tr><tr><td>A digit character (equivalent to <code>[0-9]</code>)</td><td><code>\d</code></td></tr><tr><td>Any non-digit character (equivalent to <code>[^0-9]</code>)</td><td><code>\D</code></td></tr><tr><td>Any whitespace character (equivalent to <code>[\t\r\n\f]</code>)</td><td><code>\s</code></td></tr><tr><td>Any non-whitespace character (equivalent to <code>[^\t\r\n\f]</code>)</td><td><code>\S</code></td></tr></tbody></table>

### Anchors <a href="#anchors" id="anchors"></a>

*Anchors* are special characters that can be used to match a pattern at a specified position

| Anchor              | Special Character |
| ------------------- | ----------------- |
| Beginning of line   | `^`               |
| End of line         | `$`               |
| Beginning of string | `\A`              |
| End of string       | `\Z`              |

### Repetition and Quantifier Characters <a href="#repetition_and_quantifier_characters" id="repetition_and_quantifier_characters"></a>

*Repetition or quantifier characters* specify the number of times to match a particular character or set of characters

| Repetition                     | Character |
| ------------------------------ | --------- |
| Zero or more times             | `*`       |
| One or more times              | `+`       |
| Zero or one time               | `?`       |
| Exactly n times                | `{n}`     |
| n or more times                | `{n,}`    |
| m or less times                | `{,m}`    |
| At least n and at most m times | `{n.m}`   |

Input:

```julia
# regex.jl
number1 = "(555)123-4567"
number2 = "123-45-6789"

# check if matches
if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number1)
   println("match!")
end

if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number2)
  println("match!")
else
  println("no match!")
end

# capture matches
# use parentheses to "capture" different parts of a regular 
# expression for later use the first set of parentheses corresponds 
# to index 1, second to index 2, etc.

number_details = match(r"\(([0-9]{3})\)([0-9]{3}-[0-9]{4})", number1)

if number_details != nothing
   area_code = number_details[1]
   phone_number = number_details[2]

   println("area code: $area_code")
   println("phone number: $phone_number")
end
```

Output:

```julia
match!
no match!
area code: 555
phone number: 123-4567
```

## Resources <a href="#documentation" id="documentation"></a>

* [Regular Expressions 101](https://regex101.com/)
* [Regular Expressions Library](http://www.regexlib.com/)
* [Regular Expressions Cheat Sheet](http://www.regexlib.com/CheatSheet.aspx)
* W3 Schools: [Python RegEx](https://www.w3schools.com/python/python_regex.asp)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bcbi.brown.edu/codiac-for-health/computing/python/regular-expressions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
