1 of 11

R

R is one of the many languages used by the data science community to perform data manipulation, statistical modeling and machine learning. R was designed by statisticians for statistical computing.

Resources

Installation

For most users, it is recommended to download the current stable release from https://cloud.r-project.org/.

Some developers might wish to use a different version, or to switch between versions. For this, the rvenv package can be useful.

R is also available for use in Brown's Computing Environments:

Oscar (for high-performance computing)
Stronghold (for secure computing)

macOS

Download and install the latest version of The R Project for Statistical computing for macOS here.
For an integrated development environment (IDE) / graphical interface, you can also download and install R Studio from here.

Windows

Download and install the latest version of The R Project for Statistical computing for Windows here.
For an integrated development environment (IDE) / graphical interface, you can also download and install R Studio from here.

REPL

R comes with a full-featured interactive command-line REPL (read-eval-print loop) built into theR executable. In addition to allowing quick and easy evaluation of R statements, it has a searchable history, tab-completion, many helpful keybindings, and dedicated help ? and shell modes ;.

This page provides examples of using REPL on the command line.

R REPL Example

Type "module load r" in terminal to load the R module, then on a new line type "R" to launch R
In terminal, q() quits the R module

R REPL Help Pages

Type "?" or help(function) to enter help pages within R's REPL
For example, to ask for help with linear functions in R, use help(lm) (output shown below)

Resources

REPL Environment Help

Basic Syntax

"Hello, World!" Program

This is the typical first program for those new to a programming language. It can be used to test that the of R is working and also introduce R's basic syntax using the environment or running code written using a at the command line.

Inputs:

Outputs:

Variable Assignment

Operator

Description

Example

Vectors (Classes)

Type

Example

Print Statements

Unlike other languages, R does not require the use of print statements to output code, but it does allow them. To print, you can simply write code, or include the code you want to be printed in a print() statement.

Vector Assignment and Print Statement examples:

Inputs:

Outputs:

Comments

We can write comments on our code, which do not run, to describe what certain lines of code or section of code do. These comments are just for the programmer- they will not appear anywhere in the output and simply explain what the code is doing or provide helpful notes.

To comment in R, use the “#” symbol and type your comment on the same line
R has no syntax for multi-line comments, so each line that is commented out needs a "#" symbol at the beginning

Resources

Numbers and Math

Arithmetic Operators

Operator

Description

Inputs:

Outputs:

Logical Operators

Resources

Strings and Characters

String Functions

Action

Function

Inputs:

#String length
nchar("codiac")

#Combine strings
str_c("patient ", c("a", "b", "c"))

#Sort values in a string
x <- c("carrot", "apple", "banana")
sort(x)

Outputs:

#String length
6

#Combine strings
"patient a" "patient b" "patient c"

#Sort values in a string
"apple" "banana" "carrot"

Resources

R for Data Science: String Functions

Regular Expression

RegEx Functions

Action

Function

Inputs:

#Search for substring in a string
y <- c("carrot", "apple", "banana", "carrot")
grep("carrot", y)

#Replace a single value within a string
sub("r”, “R”, y)

#Replace all instances within a string
gsub(“r”, “R”, y)

#Find matches of exact strings
grepl("car", y)

Outputs:

#Search for value in a string
1 4
#Returns the position of the value searched for

#Replace the first instance of a single value within a string
"caRrot" "apple" "banana" "caRrot" 

#Replace all instances within a string
"caRRot" "apple" "banana" "caRRot"

#Find matches of exact strings
TRUE FALSE FALSE TRUE

Resources

DataCamp: Regular Expression

Control Flow

Use Cases & Syntax

Used to test if a specific case is true or false

Short-circuit evaluation:

Test if all conditions are true
Test if any conditions are true
Test if a condition is not true

Conditional evaluation

If statement: run code if this statement is true
- Only used at the beginning of a conditional statement
Else if statement: if previous statements aren't true, try this
- Can be used an unlimited number of times in an if statement
Else statement: catch-all for anything outside of prior statements
- Only used to end a conditional statement

Inputs:

#If statement
a <- 2
b <- 1
if (a > b){
print("a is greater than b")}

#Else if statement
x <- 10
y <- 10
if (x > y){
print("x is greater than y")
} else if (x <= y){
print("x is less than or equal to y")
}

#Else statement
d <- 3
if (d > 5){
print("d is greater than 5")
} else if (d == 5){
print("d is equal to 5")
} else {
print("d is less than or equal to 5")
}

Outputs:

#If statement
[1] "a is greater than b"

#Else if statement
[1] "x is less than or equal to y"

#Else statement
[1] "d is less than or equal to 5"

Loops

Repeats a block of code a specified number of times or until some condition is met

While loop
For loop
Use break to terminate loop

Inputs:

#While loop
i <- 1
while (i < 5){
print(i)
i <- i + 1
}

#While loop with break
j <- 1
while (j < 5){
print(j)
j <- j + 1
if (j == 4){
break
}}

#For loop
fruit <- list("apple", "banana", "peach")
for (x in fruit) {
  print(x)
}

#Nested for loop
adjectives <- list("scrumptious", "overripe", "delicious")
fruit <- list("apple", "banana", "peach")
for (x in adjectives) {
    for (y in fruit) {
      print(paste(x, y))
}}

Outputs:

#While loop
[1] 1
[1] 2
[1] 3
[1] 4

#While loop with break
[1] 1
[1] 2
[1] 3

#For loop
[1] "apple"
[1] "banana"
[1] "peach"

#Nested for loop
[1] "scrumptious apple"
[1] "scrumptious banana"
[1] "scrumptious peach"
[1] "overripe apple"
[1] "overripe banana"
[1] "overripe peach"
[1] "delicious apple"
[1] "delicious banana"
[1] "delicious peach"

Resources

R Documentation: Conditional Execution
R Documentation: Repetitive Execution

Collections and Data Structures

Lists

Lists in R are ordered collections of data that can be of different classes.

Creating Lists

Action

Syntax

Accessing List Elements

Action

Syntax

Adding and Removing List Elements

Action

Syntax

Inputs:

#Create list
mylist <- list("apple", "peach", "plum")

#Access the second element of a list
mylist[2]

#Change the value of the first element of a list
mylist[1] <- "banana"
mylist

#See the number of values in a list
length(mylist)

#Check if item exists in list
"plum" %in% mylist

#Add an item to the list
append(mylist, "orange", after=2)
mylist

#Remove an item at index=3 from a list
mylist <- list("apple", "peach", "plum")
newlist <- mylist[-3]
newlist

Outputs:

#Access the second element of a list
"peach"

#Change the value of the first element of a list
[[1]]
[1] "banana"

[[2]]
[1] "peach"

[[3]]
[1] "plum"

#See the number of values in a list
3

#Check if item exists in list
TRUE

#Add an item to the list
[[1]]
[1] "banana"

[[2]]
[1] "peach"

[[3]]
[1] "orange"

[[4]]
[1] "plum"

#Remove an item from a list
[[1]]
[1] "apple"

[[2]]
[1] "peach"

Matrices

Creating Matrices

Accessing Matrix Elements

Adding and Removing Matrix Elements

Inputs:

#Creating array
heart <- matrix(c("left atrium", "left ventricle", 
    "right atrium", "right ventricle"), nrow=2, ncol=2)
heart

#Access element at row=1, column=2
heart[1,2]

#Access entire row 1
heart[1,]

#Access entire column 2
heart[,2]

#Create new row
heart1 <- rbind(heart, c("x", "x"))
heart1

#Create new column
heart2 -< cbind(heart1, c("y", "y", "z"))
heart2

Outputs:

#Creating array
     [,1]           [,2]      
[1,] "left atrium"    "right atrium"   
[2,] "left ventricle" "right ventricle"

#Access element at row=1, column=2
"right atrium"

#Access entire row 1
"left atrium"  "right atrium"

#Access entire column 2
"right atrium" "right ventricle"

#Create new row
     [,1]             [,2]             
[1,] "left atrium"    "right atrium"   
[2,] "left ventricle" "right ventricle"
[3,] "x"              "x"   

#Create new column
     [,1]             [,2]              [,3]
[1,] "left atrium"    "right atrium"    "y" 
[2,] "left ventricle" "right ventricle" "y" 
[3,] "x"              "x"               "z"

Arrays

Creating Arrays

Array Elements

Inputs:

#Creating array
a <- array(c(1:20),dim = c(4,4,2))

#Access element at row=4, column=4, dimension=1
a[4, 4, 1]

#Check if item exists in array
2 %in% a

#Sort increasing
b <- array(c(16:1),dim = c(4,4,1))
sort(b)

#Sort decreasing
c <- array(c(1:16),dim = c(4,4,1))
sort(c, decreasing = TRUE)

Outputs:

#Access element at row=4, column=4, dimension=1
16

#Check if item exists in array
TRUE

#Sort increasing
1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16

#Sort decreasing
16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1

Resources

R Documentation: Lists
R Documentation: Matrices
R Documentation: Arrays

File Input/Output

When coding in R, you will often need to input datasets to work with! The easiest ways to do so are either from a .csv file or a .txt file. To do this, you can use the read.csv() and read_table() functions, respectively. The following demonstrates these functions using a hypothetical "hospital_data" dataset.

To output a file from R, use the syntax sink("FileName.FileType").

File Input:

#If the dataset is already loaded into the R directory
read.csv("hospital_data.csv")
read_table("hospital_data.txt")

#To add a new dataset from machine downloads to directory (Mac)
read.csv("/users/username/Downloads/hospital_data.csv")
read_table("/users/username/Downloads/hospital_data.txt")

#To add a new dataset from machine desktop to directory (Windows)
read.csv("C:\\Users\\username\\Desktop\\hospital_data.csv")
read_table("C:\\Users\\username\\Desktop\\hospital_data.txt")

#Note that forward slashes are used on Mac and backwards slashes are used by Windows

File Output:

#To output a file as a .txt file:
sink("hospital_data.txt")

#To output a file as a .csv file:
sink("hospital_data.csv")

Resources:

R Documentation: read.csv file input
- More read.csv resources here
R Documentation: read_table file input
R Documentation: File output

Packages

In computer programming, a package is a collection of modules or programs that are often published as tools for a range of common use cases, such as text processing and doing math. Programmers can install these packages and take advantage of their functionality within their own code.

This page includes instructions for installing packages in R and a description of some of R's most frequently used packages.

Installing Packages

To install a package in R, you can either:

Use the install.packages("PackageName") function if you have the package downloaded locally on your machine
Or if you are using RStudio, you can use Tools > Install packages, enter in the package name and click Install

Once you install the package, you have to load it into your library using the libary(PackageName) function.

#Installing a package downloaded locally
install.packages("tidyverse")

#Once the package is installed, you have to load it
library(tidyverse)

Helpful Packages

In R, tidyverse is one of the most popular packages, as it contains an assortment of packages used for data science, such as:

ggplot2, used to create graphics and data visualization
dplyr, contains functions used for data manipulation, like mutate() and filter()
tidyr, used for data organization and cleaning
tibble, an optimized dataframe visualizer
readxl, can be used to input Excel files in .xlsx format into R

Resources

R Documentation: Packages
Tidyverse

Collections and Data Structures

Lists

Lists in R are ordered collections of data that can be of different classes.

Creating Lists

Action

Syntax

Accessing List Elements

Action

Syntax

Adding and Removing List Elements

Action

Syntax

Inputs:

#Create list
mylist <- list("apple", "peach", "plum")

#Access the second element of a list
mylist[2]

#Change the value of the first element of a list
mylist[1] <- "banana"
mylist

#See the number of values in a list
length(mylist)

#Check if item exists in list
"plum" %in% mylist

#Add an item to the list
append(mylist, "orange", after=2)
mylist

#Remove an item at index=3 from a list
mylist <- list("apple", "peach", "plum")
newlist <- mylist[-3]
newlist

Outputs:

#Access the second element of a list
"peach"

#Change the value of the first element of a list
[[1]]
[1] "banana"

[[2]]
[1] "peach"

[[3]]
[1] "plum"

#See the number of values in a list
3

#Check if item exists in list
TRUE

#Add an item to the list
[[1]]
[1] "banana"

[[2]]
[1] "peach"

[[3]]
[1] "orange"

[[4]]
[1] "plum"

#Remove an item from a list
[[1]]
[1] "apple"

[[2]]
[1] "peach"

Matrices

Creating Matrices

Action

Syntax

Accessing Matrix Elements

Action

Syntax

Adding and Removing Matrix Elements

Action

Syntax

Inputs:

#Creating array
heart <- matrix(c("left atrium", "left ventricle", 
    "right atrium", "right ventricle"), nrow=2, ncol=2)
heart

#Access element at row=1, column=2
heart[1,2]

#Access entire row 1
heart[1,]

#Access entire column 2
heart[,2]

#Create new row
heart1 <- rbind(heart, c("x", "x"))
heart1

#Create new column
heart2 -< cbind(heart1, c("y", "y", "z"))
heart2

Outputs:

#Creating array
     [,1]           [,2]      
[1,] "left atrium"    "right atrium"   
[2,] "left ventricle" "right ventricle"

#Access element at row=1, column=2
"right atrium"

#Access entire row 1
"left atrium"  "right atrium"

#Access entire column 2
"right atrium" "right ventricle"

#Create new row
     [,1]             [,2]             
[1,] "left atrium"    "right atrium"   
[2,] "left ventricle" "right ventricle"
[3,] "x"              "x"   

#Create new column
     [,1]             [,2]              [,3]
[1,] "left atrium"    "right atrium"    "y" 
[2,] "left ventricle" "right ventricle" "y" 
[3,] "x"              "x"               "z"

Arrays

Creating Arrays

Action

Syntax

Array Elements

Action

Syntax

Inputs:

#Creating array
a <- array(c(1:20),dim = c(4,4,2))

#Access element at row=4, column=4, dimension=1
a[4, 4, 1]

#Check if item exists in array
2 %in% a

#Sort increasing
b <- array(c(16:1),dim = c(4,4,1))
sort(b)

#Sort decreasing
c <- array(c(1:16),dim = c(4,4,1))
sort(c, decreasing = TRUE)

Outputs:

#Access element at row=4, column=4, dimension=1
16

#Check if item exists in array
TRUE

#Sort increasing
1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16

#Sort decreasing
16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1

Resources

R Documentation: Lists
R Documentation: Matrices
R Documentation: Arrays