DataFrames
data.frame,data.tableand the dplyr package provide a set of tools for working with tabular data in R. Their design and functionality are similar to those of DataFrames.jl (in Julia) and pandas (in Python), making them great general purpose data science tools.
This page provides examples of using data.frame, data.table, and dplyr, demonstrating the syntax and common functions within the tools.
Example
Installing data.frame, data.table, and dplyr in R.
The data.frame package comes preloaded into R, and the dplyr package is part of the tidyverse package (see Packages section for tidyverse installation instructions). To install data.table, use install.packages('data.table').
This example will take place using data.frame as it is does not require additional packages- see resources at the bottom of this page for additional information on data.table and dplyr.
Create DataFrame
#Create DataFrame
df <- data.frame(
id = 1:5,
gender = c("F", "M", "F", "M", "F"),
age = c(68, 54, 49, 28, 36)
)Display DataFrame
Input:
#Display DataFrame
dfOutput:
Print first two lines of DataFrame
Input:
Output:
Print last two lines of DataFrame
Input:
Output:
Describe DataFrame
DataFrame size:
Input:
Output:
DataFrame column names:
Input:
Output:
DataFrame description:
Input:
Output:
Accessing DataFrames
Get "age" column (different ways to call the column)
Input:
Output:
Get row
Input:
Output:
Get element
Input:
Output:
Get subset (specific rows and all columns)
Input:
Output:
Get subset (all rows and specific columns)
Input:
Output:
Get subset (all rows meeting specified criteria - numbers)
Input:
Output:
Get subset (all rows meeting specified criteria - strings)
Input:
Output:
Get subset (all rows meeting specified criteria)
Input:
Output:
Add Column
New columns with specified values
Input:
Output:
New column with calculated value
Input:
Output:
Get counts/frequency
Input:
Output:
Transform DataFrame
sort
Input:
Output:
stack (reshape from wide to long format)
Input:
Output:
unstack (reshape from long to wide format)
Input:
Output:
Traversing DataFrame (for loops)
sort
Input:
Output:
Notes:
When performing functions such as sorting or transformation, using a package like data.table or dplyr will typically be easier than using base R (data.table), as those packages include commands designed for DataFrame manipulation. This guide uses base R for the sake of continuity.
Resources
R Documentation: data.table
Tidyverse: dplyr
Last updated
