Skip to main content

6.11 Intro to R

Operators

Assignment

OperatorDescription
<- or = or <<-Called Left Assignment
> or ->>Called Right Assignment

Arithmetic

OperatorDescription
Addition+
Subtraction-
Multiplication*
Division/
Power (Exponent)^ or **
Negation (for Bool)!x

Logical

OperatorDescription
>Greater than
<Less than
>=Greater than or equal
<=Less than or equal
==Exactly equal
!=Not equal to
&Entry wise and

Basic Objects

Vectors

TypeExample
LogicalTRUE, FALSE
Numeric1, 55, 999
Integer1L, 32L, 0L
Complex2 + 3i
Character"great", "23.4"

Create a vector

R"
apple <- c('red','green','yellow')

print(apple)

# Get the class of the vector.
print(class(apple))

"

Output:

LoadError: UndefVarError: @R_str not defined
in expression starting at none:1

Lists

Contain many different types of elements inside

# Create a list.

R"
list1 <- list(c(2,5,3),21.3)

# Print the list.
print(list1)
"

Output:

LoadError: UndefVarError: @R_str not defined
in expression starting at none:3

Matrices

Two-dimensional rectangular data set

R"
M = matrix( c('a','a','b','b','c','c'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
"

Output:

LoadError: UndefVarError: @R_str not defined
in expression starting at none:1

Arrays

Multi-dimensional data set

R"
a <- array(c('a','b'),dim = c(4,4,2))
print(a)
"

Output:

LoadError: UndefVarError: @R_str not defined
in expression starting at none:1

Factors

Stores the vector along with the distinct values of the elements in the vector as labels

R"
apple <- c('red','green','yellow')
factor_apple <- factor(apple)
print(factor_apple)
"

Output:

LoadError: UndefVarError: @R_str not defined
in expression starting at none:1

Data Frames

Tabular data objects

# Create the data frame.

R"
BMI <- 	data.frame(
   gender = c('Male', 'Female','Female'),
   height = c(182, 141.5, 165),
   weight = c(101,95, 88),
   Age = c(35,43,22)
)
print(BMI)
"

Output:

LoadError: UndefVarError: @R_str not defined
in expression starting at none:3

Loops

Repeat Loop

repeat {
   commands
   if(condition) {
      break
   }
}

While Loop

while (test_expression) {
   statement
}

For Loop

while (test_expression) {
   statement
}

Working with Data

Useful Packages

TasksLists
Load datautils, openxlsx, foreign, haven
Manipulate datatidyverse, dplyr,tidyr
Visualize dataggplot2, lattice, plotly
Modeling2 + 3i
Characterglmnet, randomForest, caret, survival

Import Data

#CSV
df <- read.csv("c:/data.csv", header = T)

#Excel
df <- read.xlsx("c:/data.xlsx")

For the following examples, use package called datasets

Data Exploration

Use dataset called mtcars from package datasets

  • str(data): gives a quick overview of the rows and columns of the dataset.

import Pkg; Pkg.add("RCall")
import Pkg; Pkg.add("RDatasets")
using RCall
using RDatasets
mtcars = dataset("datasets","mtcars");
@rput mtcars

R"str(mtcars)"

Output:

e[?25le[2Ke[?25hUnable to automatically install 'Zlib' from '/home/runner/.julia/packages/Zlib_jll/BGVLi/Artifacts.toml'
  • head(data,n) and tail(data,n)

head(): Top n elements

tail(): Bottom n elements

@rput mtcars

R"print(head(mtcars, n = 3))"
R"print(tail(mtcars, n = 3))"

Output:

LoadError: UndefVarError: @rput not defined
in expression starting at none:1

Descriptive Statistics

  • summary(data): gives descriptive statistics for each variable

Common Functions

TasksFunctions
Meanmean()
Standard deviationsd()
Variancevar()
Minimummin()
Maximummax()
Medianmedian()
Range of valuesrange()
Sample quantilesquantile()
Interquartile rangeIQR()

Case of missing values

  • na.rm = T

@rput mtcars

R"print(mean(mtcars$MPG, na.rm = T))"

Output:

LoadError: UndefVarError: @rput not defined
in expression starting at none:1

Basic Plots

  • plot()

using RCall
using RDatasets
mtcars = dataset("datasets","mtcars");

@rput mtcars

R"
png('plot2.png')
plot(mtcars$Disp, mtcars$DRat)
dev.off()
"

Output:

  • barplot()

R"
png('barplot2.png')
barplot(mtcars$Cyl,main = 'Number of Cylinders',xlab = 'cyl', col='blue',horiz = FALSE)
dev.off()
"

Output:

  • histogram()

R"
png('histogram.png')
hist(mtcars$Disp,main = 'Displacement (cu.in.)',xlab = 'disp', col='red')
dev.off()
"

Output:

  • boxplot()

using RCall
using RDatasets
mtcars = dataset("datasets","mtcars");

@rput mtcars

R"
png('boxplot2.png')
boxplot(mtcars$Disp)
dev.off()
"

Output:

  • qqplot() or qqnorm(): check whether the data is normally distributed

  • qqline(): adds a reference line

using RCall
using RDatasets
mtcars = dataset("datasets","mtcars");

@rput mtcars

R"
png('qqnorm.png')
qqnorm(mtcars$Disp)
qqline(mtcars$Disp)
dev.off()
"

Output:

ArgumentError: Package RCall not found in current path:
- Run `import Pkg; Pkg.add("RCall")` to install the RCall package.

Statistical Analysis

AnalysisContinuous Outcome(Y)Binary Outcome(Y)
Correlation Analysis
X: Continuouscor.test()t.test()
X: Categoricalt.test(), ANOVA()chisq.test()
Regression Modellm()glm()