Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Get string length
nchar(string)
Combine two strings
str_c(string1, string2)
Sort values within a string
sort(string1, string2, string3)
R for Data Science: String Functions
R Documentation:
R Documentation:
Addition
+
Subtraction
-
Multiplication
*
Division
/
Power (Exponent)
^ or **
Remainder (Modulo)
%%
Negation (for Bool)
!x
>
Greater than
<
Less than
>=
Greater than or equal
<=
Less than or equal
==
Exactly equal
!=
Not equal to
&
Entry wise and
This is the typical first program for those new to a programming language. It can be used to test that the Installation of R is working and also introduce R's basic syntax using the REPL environment or running code written using a Text Editor at the Unix command line.
<- or = or <<-
Left Assignment
x <- 7, x = 7, x <<- 7
-> or ->>
Right Assignment
x -> 7, x ->> 7
Logical
TRUE, FALSE
Numeric
1, 55, 999
Integer
1L, 32L, 0L
Complex
2 + 3i
Character
"great", "23.4"
Unlike other languages, R does not require the use of print statements to output code, but it does allow them. To print, you can simply write code, or include the code you want to be printed in a print() statement.
We can write comments on our code, which do not run, to describe what certain lines of code or section of code do. These comments are just for the programmer- they will not appear anywhere in the output and simply explain what the code is doing or provide helpful notes.
To comment in R, use the “#” symbol and type your comment on the same line
R has no syntax for multi-line comments, so each line that is commented out needs a "#" symbol at the beginning
R Documentation: Vectors and Assignment
R Documentation: Comments
For most users, it is recommended to download the current stable release from https://cloud.r-project.org/.
Some developers might wish to use a different version, or to switch between versions. For this, the rvenv package can be useful.
R is also available for use in Brown's Computing Environments:
Oscar (for high-performance computing)
Stronghold (for secure computing)
Download and install the latest version of The R Project for Statistical computing for macOS here.
For an integrated development environment (IDE) / graphical interface, you can also download and install R Studio from here.
R comes with a full-featured interactive command-line REPL (read-eval-print loop) built into the
R
executable. In addition to allowing quick and easy evaluation of R statements, it has a searchable history, tab-completion, many helpful keybindings, and dedicated help?
and shell modes;
.
This page provides examples of using REPL on the command line.
Type "module load r" in terminal to load the R module, then on a new line type "R" to launch R
In terminal, q() quits the R module
Type "?" or help(function) to enter help pages within R's REPL
For example, to ask for help with linear functions in R, use help(lm) (output shown below)
When coding in R, you will often need to input datasets to work with! The easiest ways to do so are either from a .csv file or a .txt file. To do this, you can use the read.csv() and read_table() functions, respectively. The following demonstrates these functions using a hypothetical "hospital_data" dataset.
To output a file from R, use the syntax sink("FileName.FileType").
R Documentation: read.csv file input
More read.csv resources here
R Documentation: read_table file input
R Documentation: File output
Lists in R are ordered collections of data that can be of different classes.
R Documentation:
R Documentation:
R Documentation:
New list (empty)
listname <- list()
New list (misc)
listname <- list(1L, "abc", 10.3)
Access an element
list[position]
Change a value
list[position] <- newvalue
See number of values in a list
length(list)
See if item is present in a list
item %in% list
Add item to a list
append(list)
Add item to a list at a specific position
append(list, after=index number)
Remove item from list
newlist <- list[-index number]
New matrix (empty)
matrixname <- matrix()
New matrix (numbers)
matrixname <- matrix(data, nrow=, ncol=)
New matrix (strings)
matrixname <- matrix(data, nrow=, ncol=)
Access a matrix element
matrix[row position, column position]
Access an entire row
matrix[row position,]
Access an entire column
matrix[,column position]
Create an additional row
rbind(matrix, values for new row)
Create an additional column
cbind(matrix, values for new column)
New array (empty)
arrayname <- array()
New array (numbers)
arrayname <- array(data, dim(nrow=, ncol=, ndim=)
New array (strings
arrayname <- array(data, dim(nrow=, ncol=, ndim=)
Access an array element
array[row position, column position, dimension]
Check if an item exists
value %in% array
Sort array increasing
sort(array)
Sort array decreasing
sort(array, decreasing = TRUE)
In computer programming, a package is a collection of modules or programs that are often published as tools for a range of common use cases, such as text processing and doing math. Programmers can install these packages and take advantage of their functionality within their own code.
This page includes instructions for installing packages in R and a description of some of R's most frequently used packages.
To install a package in R, you can either:
Use the install.packages("PackageName") function if you have the package downloaded locally on your machine
Or if you are using RStudio, you can use Tools > Install packages, enter in the package name and click Install
Once you install the package, you have to load it into your library using the libary(PackageName) function.
In R, tidyverse is one of the most popular packages, as it contains an assortment of packages used for data science, such as:
ggplot2, used to create graphics and data visualization
dplyr, contains functions used for data manipulation, like mutate() and filter()
tidyr, used for data organization and cleaning
tibble, an optimized dataframe visualizer
readxl, can be used to input Excel files in .xlsx format into R
R Documentation: Packages
Used to test if a specific case is true or false
Short-circuit evaluation:
Test if all conditions are true
Test if any conditions are true
Test if a condition is not true
If statement: run code if this statement is true
Only used at the beginning of a conditional statement
Else if statement: if previous statements aren't true, try this
Can be used an unlimited number of times in an if statement
Else statement: catch-all for anything outside of prior statements
Only used to end a conditional statement
Repeats a block of code a specified number of times or until some condition is met
While loop
For loop
Use break to terminate loop
>
Greater than
<
Less than
>=
Greater than or equal
<=
Less than or equal
==
Exactly equal
!=
Not equal to
&
Entry wise and
Input:
Output:
R Documentation: Conditional Execution
R Documentation: Repetitive Execution
Search for a substring within a string
grep(substring/value, string)
Replace a single value within a string
sub(pattern, replacement, string)
Replace all instances within a string
gsub(pattern, replacement, string)
Find matches for exact string
grepl(pattern, string)
DataCamp: Regular Expression
data.frame
,data.table
and the dplyr package provide a set of tools for working with tabular data in R. Their design and functionality are similar to those of DataFrames.jl (in Julia) and pandas (in Python), making them great general purpose data science tools.
This page provides examples of using data.frame, data.table, and dplyr, demonstrating the syntax and common functions within the tools.
Installing data.frame, data.table, and dplyr in R.
The data.frame package comes preloaded into R, and the dplyr package is part of the tidyverse package (see Packages section for tidyverse installation instructions). To install data.table, use install.packages('data.table').
This example will take place using data.frame as it is does not require additional packages- see resources at the bottom of this page for additional information on data.table and dplyr.
Create DataFrame
Display DataFrame
Input:
Output:
Print first two lines of DataFrame
Input:
Output:
Print last two lines of DataFrame
Input:
Output:
Describe DataFrame
DataFrame size:
Input:
Output:
DataFrame column names:
Input:
Output:
DataFrame description:
Input:
Output:
Accessing DataFrames
Get "age" column (different ways to call the column)
Input:
Output:
Get row
Input:
Output:
Get element
Input:
Output:
Get subset (specific rows and all columns)
Input:
Output:
Get subset (all rows and specific columns)
Input:
Output:
Get subset (all rows meeting specified criteria - numbers)
Input:
Output:
Get subset (all rows meeting specified criteria - strings)
Input:
Output:
Get subset (all rows meeting specified criteria)
Input:
Output:
Add Column
New columns with specified values
Input:
Output:
New column with calculated value
Input:
Output:
Get counts/frequency
Input:
Output:
Transform DataFrame
sort
Input:
Output:
stack (reshape from wide to long format)
Input:
Output:
unstack (reshape from long to wide format)
Input:
Output:
Traversing DataFrame (for loops)
sort
Input:
Output:
When performing functions such as sorting or transformation, using a package like data.table or dplyr will typically be easier than using base R (data.table), as those packages include commands designed for DataFrame manipulation. This guide uses base R for the sake of continuity.
R Documentation: data.table
Tidyverse: dplyr
This page will go over much of the same content as the DataFrames R page, but using tidyverse's dplyr and tidyr packages rather than base R. You may notice that pipes (%>%) are used more often here. Pipes are functionally the same as other elements like summary() or $, but tend to be the predominant syntax for more advanced uses of R, particularly in the tidyverse, as they can help chain multiple operations in the same line of code.
In order to use the tidyverse modules, they first have to be installed. Ensure that the following code is at the top of your coding environment:
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output: