Regular Expressions
Regular expressions (regex) are powerful tools for pattern matching and text processing. They are represented as a pattern that consists of a special set of characters to search for in a string
str
.
This page provides syntax for regular expressions in Julia . Each section includes an example to demonstrate the described methods.
Functions
Check if regex matches a string
occursin(r"pattern", str)
Capture regex matches
match(r"pattern", str)
Specify alternative regex
pattern1|pattern2
Character Class
Character class specifies a list of characters to match ([...]
where ...
represents the list) or not match ([^...]
)
Character Class
...
Any lowercase vowel
\[aeiou]
Any digit
[0-9]
Any lowercase letter
[a-z]
Any uppercase letter
[A-Z]
Any digit, lowercase letter, or uppercase letter
[a-zA-Z0-9]
Anything except a lowercase vowel
[^aeiou]
Anything except a digit
[^0-9]
Anything except a space
[^ ]
Any character
.
Any word character (equivalent to [a-zA-Z0-9_]
)
\w
Any non-word character (equivalent to [^a-zA-Z0-9_]
)
W
A digit character (equivalent to [0-9]
)
\d
Any non-digit character (equivalent to [^0-9]
)
\D
Any whitespace character (equivalent to [\t\r\n\f]
)
\s
Any non-whitespace character (equivalent to [^\t\r\n\f]
)
\S
Anchors
Anchors are special characters that can be used to match a pattern at a specified position
Beginning of line
^
End of line
$
Beginning of string
\A
End of string
\Z
Repetition and Quantifier Characters
Repetition or quantifier characters specify the number of times to match a particular character or set of characters
Zero or more times
*
One or more times
+
Zero or one time
?
Exactly n times
{n}
n or more times
{n,}
m or less times
{,m}
At least n and at most m times
{n.m}
Input:
# regex.jl
number1 = "(555)123-4567"
number2 = "123-45-6789"
# check if matches
if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number1)
println("match!")
end
if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number2)
println("match!")
else
println("no match!")
end
# capture matches
# use parentheses to "capture" different parts of a regular
# expression for later use the first set of parentheses corresponds
# to index 1, second to index 2, etc.
number_details = match(r"\(([0-9]{3})\)([0-9]{3}-[0-9]{4})", number1)
if number_details != nothing
area_code = number_details[1]
phone_number = number_details[2]
println("area code: $area_code")
println("phone number: $phone_number")
end
Output:
match!
no match!
area code: 555
phone number: 123-4567
Resources
Julia Documentation: Manual - Strings (see Regular Expressions)
Think Julia: Chapter 8 - Strings
Last updated