For the complete documentation index, see llms.txt. This page is also available as Markdown.
Regular Expressions
Regular expressions are powerful tools for pattern matching and text processing. They are represented as a pattern that consists of a special set of characters to search for in a string str. The regex module needs to be imported before use.
This page provides syntax for regular expressions in Python . Each section includes an example to demonstrate the described methods.
Functions
Action
Function
Check if regex matches a string
re.search("pattern", string, flag=0)
Capture regex matches
re.match("pattern", string, flag=0)
Specify alternative regex
pattern1|pattern2
Character Class
Character class specifies a list of characters to match ([...] where ... represents the list) or not match ([^...])
Character Class
...
Any lowercase vowel
[aeiou]
Any digit
[0-9]
Any lowercase letter
[a-z]
Any uppercase letter
[A-Z]
Any digit, lowercase letter, or uppercase letter
[a-zA-Z0-9]
Anything except a lowercase vowel
[^aeiou]
Anything except a digit
[^0-9]
Anything except a space
[^ ]
Any character
.
Any word character (equivalent to [a-zA-Z0-9_])
\w
Any non-word character (equivalent to [^a-zA-Z0-9_])
W
A digit character (equivalent to [0-9])
\d
Any non-digit character (equivalent to [^0-9])
\D
Any whitespace character (equivalent to [\t\r\n\f])
\s
Any non-whitespace character (equivalent to [^\t\r\n\f])
\S
Anchors
Anchors are special characters that can be used to match a pattern at a specified position
Anchor
Special Character
Beginning of line
^
End of line
$
Beginning of string
\A
End of string
\Z
Repetition and Quantifier Characters
Repetition or quantifier characters specify the number of times to match a particular character or set of characters
# regex.jl
number1 = "(555)123-4567"
number2 = "123-45-6789"
# check if matches
if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number1)
println("match!")
end
if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number2)
println("match!")
else
println("no match!")
end
# capture matches
# use parentheses to "capture" different parts of a regular
# expression for later use the first set of parentheses corresponds
# to index 1, second to index 2, etc.
number_details = match(r"\(([0-9]{3})\)([0-9]{3}-[0-9]{4})", number1)
if number_details != nothing
area_code = number_details[1]
phone_number = number_details[2]
println("area code: $area_code")
println("phone number: $phone_number")
end
match!
no match!
area code: 555
phone number: 123-4567