DataFrames

data.frame, data.table and the dplyr package provide a set of tools for working with tabular data in R. Their design and functionality are similar to those of DataFrames.jl (in Julia) and pandas (in Python), making them great general purpose data science tools.

This page provides examples of using data.frame, data.table, and dplyr, demonstrating the syntax and common functions within the tools.

Example

Installing data.frame, data.table, and dplyr in R.

The data.frame package comes preloaded into R, and the dplyr package is part of the tidyverse package (see Packages section for tidyverse installation instructions). To install data.table, use install.packages('data.table').

This example will take place using data.frame as it is does not require additional packages- see resources at the bottom of this page for additional information on data.table and dplyr.

Create DataFrame

#Create DataFrame
df <- data.frame(
  id = 1:5,
  gender = c("F", "M", "F", "M", "F"),
  age = c(68, 54, 49, 28, 36)
)

Display DataFrame

Input:

#Display DataFrame
df

Output:

Print first two lines of DataFrame

Input:

Output:

Print last two lines of DataFrame

Input:

Output:

Describe DataFrame

DataFrame size:

Input:

Output:

DataFrame column names:

Input:

Output:

DataFrame description:

Input:

Output:

Accessing DataFrames

Get "age" column (different ways to call the column)

Input:

Output:

Get row

Input:

Output:

Get element

Input:

Output:

Get subset (specific rows and all columns)

Input:

Output:

Get subset (all rows and specific columns)

Input:

Output:

Get subset (all rows meeting specified criteria - numbers)

Input:

Output:

Get subset (all rows meeting specified criteria - strings)

Input:

Output:

Get subset (all rows meeting specified criteria)

Input:

Output:

Add Column

New columns with specified values

Input:

Output:

New column with calculated value

Input:

Output:

Get counts/frequency

Input:

Output:

Transform DataFrame

sort

Input:

Output:

stack (reshape from wide to long format)

Input:

Output:

unstack (reshape from long to wide format)

Input:

Output:

Traversing DataFrame (for loops)

sort

Input:

Output:

Notes:

When performing functions such as sorting or transformation, using a package like data.table or dplyr will typically be easier than using base R (data.table), as those packages include commands designed for DataFrame manipulation. This guide uses base R for the sake of continuity.

Resources

Last updated