Regular Expressions

Regular expressions are powerful tools for pattern matching and text processing. They are represented as a pattern that consists of a special set of characters to search for in a string str. The regex module needs to be imported before use.

This page provides syntax for regular expressions in Python . Each section includes an example to demonstrate the described methods.

Functions

Action

Function

Check if regex matches a string

re.search("pattern", string, flag=0)

Capture regex matches

re.match("pattern", string, flag=0)

Specify alternative regex

pattern1|pattern2

Character Class

Character class specifies a list of characters to match ([...] where ... represents the list) or not match ([^...])

Character Class

...

Any lowercase vowel

[aeiou]

Any digit

[0-9]

Any lowercase letter

[a-z]

Any uppercase letter

[A-Z]

Any digit, lowercase letter, or uppercase letter

[a-zA-Z0-9]

Anything except a lowercase vowel

[^aeiou]

Anything except a digit

[^0-9]

Anything except a space

[^ ]

Any character

.

Any word character (equivalent to [a-zA-Z0-9_])

\w

Any non-word character (equivalent to [^a-zA-Z0-9_])

W

A digit character (equivalent to [0-9])

\d

Any non-digit character (equivalent to [^0-9])

\D

Any whitespace character (equivalent to [\t\r\n\f])

\s

Any non-whitespace character (equivalent to [^\t\r\n\f])

\S

Anchors

Anchors are special characters that can be used to match a pattern at a specified position

Anchor

Special Character

Beginning of line

^

End of line

$

Beginning of string

\A

End of string

\Z

Repetition and Quantifier Characters

Repetition or quantifier characters specify the number of times to match a particular character or set of characters

Repetition

Character

Zero or more times

*

One or more times

+

Zero or one time

?

Exactly n times

{n}

n or more times

{n,}

m or less times

{,m}

At least n and at most m times

{n.m}

Input:

# regex.jl
number1 = "(555)123-4567"
number2 = "123-45-6789"

# check if matches
if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number1)
   println("match!")
end

if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number2)
  println("match!")
else
  println("no match!")
end

# capture matches
# use parentheses to "capture" different parts of a regular 
# expression for later use the first set of parentheses corresponds 
# to index 1, second to index 2, etc.

number_details = match(r"\(([0-9]{3})\)([0-9]{3}-[0-9]{4})", number1)

if number_details != nothing
   area_code = number_details[1]
   phone_number = number_details[2]

   println("area code: $area_code")
   println("phone number: $phone_number")
end

Output:

match!
no match!
area code: 555
phone number: 123-4567

Resources

PreviousStrings and Characters NextControl Flow

Last updated 8 months ago