Skip to main content

6.3 Strings, Characters, and Regular Expressions in Julia


Characters and Strings

  • Char is a single character

  • String is a sequence of one or more characters (index values start at 1)

Some functions that can be performed on strings

get word lengthlength(word)
extract nth character from wordword[n]
extract substring nth-mth character from wordword[n:m]
search for letter in wordfindfirst(isequal(letter), word)
search for subword in wordoccursin(word, subword)
remove record separator from word (e.g., n)chomp(word)
remove last character from wordchop(word)

Use typeof() function to determine type


# chars_and_strings.jl

letter = 'b'
word = "good-bye"
subword = "good"

word_length = length(word)
word_first_char = word[1]
word_subword = word[6:8]

println("Length of word: $word_length")
println("First character: $word_first_char")
println("Last three characters: $word_subword")

println("$letter is in $word: $(findfirst(isequal(letter), word))")
println("$subword is in $word: $(occursin(subword, word))")
println("chop off the last character: $(chop(word))")


Length of word: 8
First character: g
Last three characters: bye
b is in good-bye: 6
good is in good-bye: true
chop off the last character: good-by

Regular Expressions (regex)

Regular expressions are powerful tools for pattern matching and text processing. They are representated ad a pattern that consists of a special set of characters to search for in a string str.


Check if regex matches a stringoccursin(r"pattern", str)
Capture regex matchesmatch(r"pattern", str)
Specify alternative regexpattern1|pattern2

Character Class

Character class specifies a list of characters to match ([...] where ... represents the list) or not match ([^...])

Character Class...
Any lowercase vowel\[aeiou]
Any digit[0-9]
Any lowercase letter[a-z]
Any uppercase letter[A-Z]
Any digit, lowercase letter, or uppercase letter[a-zA-Z0-9]
Anything except a lowercase vowel[^aeiou]
Anything except a digit[^0-9]
Anything except a space[^ ]
Any character.
Any word character (equivalent to [a-zA-Z0-9_])\w
Any non-word character (equivalent to [^a-zA-Z0-9_])W
A digit character (equivalent to [0-9])\d
Any non-digit character (equivalent to [^0-9])\D
Any whitespace character (equivalent to [\t\r\n\f])\s
Any non-whitespace character (equivalent to [^\t\r\n\f])\S


Anchors are special characters that can be used to match a pattern at a specified position

AnchorSpecial Character
Beginning of line^
End of line$
Beginning of string\A
End of string\Z

Repetition and Quantifier Characters

Repetition or quantifier characters specify the number of times to match a particular character or set of characters

Zero or more times*
One or more times+
Zero or one time?
Exactly n times{n}
n or more times{n,}
m or less times{,m}
At least n and at most m times{n.m}


# regex.jl
number1 = "(555)123-4567"
number2 = "123-45-6789"

# check if matches
if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number1)

if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number2)
  println("no match!")

# capture matches
# use parentheses to "capture" different parts of a regular 
# expression for later use the first set of parentheses corresponds 
# to index 1, second to index 2, etc.

number_details = match(r"\(([0-9]{3})\)([0-9]{3}-[0-9]{4})", number1)

if number_details != nothing
   area_code = number_details[1]
   phone_number = number_details[2]

   println("area code: $area_code")
   println("phone number: $phone_number")


no match!
area code: 555
phone number: 123-4567