Skip to main content

6.3 Strings, Characters, and Regular Expressions in Julia

Documentation

Characters and Strings

  • Char is a single character

  • String is a sequence of one or more characters (index values start at 1)

Some functions that can be performed on strings

ActionFunction
get word lengthlength(word)
extract nth character from wordword[n]
extract substring nth-mth character from wordword[n:m]
search for letter in wordfindfirst(isequal(letter), word)
search for subword in wordoccursin(word, subword)
remove record separator from word (e.g., n)chomp(word)
remove last character from wordchop(word)

Use typeof() function to determine type

Input:

# chars_and_strings.jl

letter = 'b'
word = "good-bye"
subword = "good"

word_length = length(word)
word_first_char = word[1]
word_subword = word[6:8]

println("Length of word: $word_length")
println("First character: $word_first_char")
println("Last three characters: $word_subword")

println("$letter is in $word: $(findfirst(isequal(letter), word))")
println("$subword is in $word: $(occursin(subword, word))")
println("chop off the last character: $(chop(word))")

Output:

Length of word: 8
First character: g
Last three characters: bye
b is in good-bye: 6
good is in good-bye: true
chop off the last character: good-by

Regular Expressions (regex)

Regular expressions are powerful tools for pattern matching and text processing. They are representated ad a pattern that consists of a special set of characters to search for in a string str.

Functions

ActionFunction
Check if regex matches a stringoccursin(r"pattern", str)
Capture regex matchesmatch(r"pattern", str)
Specify alternative regexpattern1|pattern2

Character Class

Character class specifies a list of characters to match ([...] where ... represents the list) or not match ([^...])

Character Class...
Any lowercase vowel\[aeiou]
Any digit[0-9]
Any lowercase letter[a-z]
Any uppercase letter[A-Z]
Any digit, lowercase letter, or uppercase letter[a-zA-Z0-9]
Anything except a lowercase vowel[^aeiou]
Anything except a digit[^0-9]
Anything except a space[^ ]
Any character.
Any word character (equivalent to [a-zA-Z0-9_])\w
Any non-word character (equivalent to [^a-zA-Z0-9_])W
A digit character (equivalent to [0-9])\d
Any non-digit character (equivalent to [^0-9])\D
Any whitespace character (equivalent to [\t\r\n\f])\s
Any non-whitespace character (equivalent to [^\t\r\n\f])\S

Anchors

Anchors are special characters that can be used to match a pattern at a specified position

AnchorSpecial Character
Beginning of line^
End of line$
Beginning of string\A
End of string\Z

Repetition and Quantifier Characters

Repetition or quantifier characters specify the number of times to match a particular character or set of characters

RepetitionCharacter
Zero or more times*
One or more times+
Zero or one time?
Exactly n times{n}
n or more times{n,}
m or less times{,m}
At least n and at most m times{n.m}

Input:

# regex.jl
number1 = "(555)123-4567"
number2 = "123-45-6789"

# check if matches
if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number1)
   println("match!")
end

if occursin(r"\([0-9]{3}\)[0-9]{3}-[0-9]{4}", number2)
  println("match!")
else
  println("no match!")
end

# capture matches
# use parentheses to "capture" different parts of a regular 
# expression for later use the first set of parentheses corresponds 
# to index 1, second to index 2, etc.

number_details = match(r"\(([0-9]{3})\)([0-9]{3}-[0-9]{4})", number1)

if number_details != nothing
   area_code = number_details[1]
   phone_number = number_details[2]

   println("area code: $area_code")
   println("phone number: $phone_number")
end

Output:

match!
no match!
area code: 555
phone number: 123-4567