6.11 Intro to R
Operators
Assignment
Operator | Description |
---|---|
<- or = or <<- | Called Left Assignment |
> or ->> | Called Right Assignment |
Arithmetic
Operator | Description |
---|---|
Addition | + |
Subtraction | - |
Multiplication | * |
Division | / |
Power (Exponent) | ^ or ** |
Negation (for Bool) | !x |
Logical
Operator | Description |
---|---|
> | Greater than |
< | Less than |
>= | Greater than or equal |
<= | Less than or equal |
== | Exactly equal |
!= | Not equal to |
& | Entry wise and |
Basic Objects
Vectors
Type | Example |
---|---|
Logical | TRUE, FALSE |
Numeric | 1, 55, 999 |
Integer | 1L, 32L, 0L |
Complex | 2 + 3i |
Character | "great", "23.4" |
Create a vector
R"
apple <- c('red','green','yellow')
print(apple)
# Get the class of the vector.
print(class(apple))
"
Output:
LoadError: UndefVarError: @R_str not defined
in expression starting at none:1
Lists
Contain many different types of elements inside
# Create a list.
R"
list1 <- list(c(2,5,3),21.3)
# Print the list.
print(list1)
"
Output:
LoadError: UndefVarError: @R_str not defined
in expression starting at none:3
Matrices
Two-dimensional rectangular data set
R"
M = matrix( c('a','a','b','b','c','c'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
"
Output:
LoadError: UndefVarError: @R_str not defined
in expression starting at none:1
Arrays
Multi-dimensional data set
R"
a <- array(c('a','b'),dim = c(4,4,2))
print(a)
"
Output:
LoadError: UndefVarError: @R_str not defined
in expression starting at none:1
Factors
Stores the vector along with the distinct values of the elements in the vector as labels
R"
apple <- c('red','green','yellow')
factor_apple <- factor(apple)
print(factor_apple)
"
Output:
LoadError: UndefVarError: @R_str not defined
in expression starting at none:1
Data Frames
Tabular data objects
# Create the data frame.
R"
BMI <- data.frame(
gender = c('Male', 'Female','Female'),
height = c(182, 141.5, 165),
weight = c(101,95, 88),
Age = c(35,43,22)
)
print(BMI)
"
Output:
LoadError: UndefVarError: @R_str not defined
in expression starting at none:3
Loops
Repeat Loop
repeat {
commands
if(condition) {
break
}
}
While Loop
while (test_expression) {
statement
}
For Loop
while (test_expression) {
statement
}
Working with Data
Useful Packages
Tasks | Lists |
---|---|
Load data | utils, openxlsx, foreign, haven |
Manipulate data | tidyverse, dplyr,tidyr |
Visualize data | ggplot2, lattice, plotly |
Modeling | 2 + 3i |
Character | glmnet, randomForest, caret, survival |
Import Data
#CSV
df <- read.csv("c:/data.csv", header = T)
#Excel
df <- read.xlsx("c:/data.xlsx")
For the following examples, use package called datasets
Data Exploration
Use dataset called mtcars from package datasets
str(data): gives a quick overview of the rows and columns of the dataset.
import Pkg; Pkg.add("RCall")
import Pkg; Pkg.add("RDatasets")
using RCall
using RDatasets
mtcars = dataset("datasets","mtcars");
@rput mtcars
R"str(mtcars)"
Output:
e[?25le[2Ke[?25hUnable to automatically install 'Zlib' from '/home/runner/.julia/packages/Zlib_jll/BGVLi/Artifacts.toml'
head(data,n) and tail(data,n)
head(): Top n elements
tail(): Bottom n elements
@rput mtcars
R"print(head(mtcars, n = 3))"
R"print(tail(mtcars, n = 3))"
Output:
LoadError: UndefVarError: @rput not defined
in expression starting at none:1
Descriptive Statistics
summary(data): gives descriptive statistics for each variable
Common Functions
Tasks | Functions |
---|---|
Mean | mean() |
Standard deviation | sd() |
Variance | var() |
Minimum | min() |
Maximum | max() |
Median | median() |
Range of values | range() |
Sample quantiles | quantile() |
Interquartile range | IQR() |
Case of missing values
na.rm = T
@rput mtcars
R"print(mean(mtcars$MPG, na.rm = T))"
Output:
LoadError: UndefVarError: @rput not defined
in expression starting at none:1
Basic Plots
plot()
using RCall
using RDatasets
mtcars = dataset("datasets","mtcars");
@rput mtcars
R"
png('plot2.png')
plot(mtcars$Disp, mtcars$DRat)
dev.off()
"
Output:
barplot()
R"
png('barplot2.png')
barplot(mtcars$Cyl,main = 'Number of Cylinders',xlab = 'cyl', col='blue',horiz = FALSE)
dev.off()
"
Output:
histogram()
R"
png('histogram.png')
hist(mtcars$Disp,main = 'Displacement (cu.in.)',xlab = 'disp', col='red')
dev.off()
"
Output:
boxplot()
using RCall
using RDatasets
mtcars = dataset("datasets","mtcars");
@rput mtcars
R"
png('boxplot2.png')
boxplot(mtcars$Disp)
dev.off()
"
Output:
qqplot() or qqnorm(): check whether the data is normally distributed
qqline(): adds a reference line
using RCall
using RDatasets
mtcars = dataset("datasets","mtcars");
@rput mtcars
R"
png('qqnorm.png')
qqnorm(mtcars$Disp)
qqline(mtcars$Disp)
dev.off()
"
Output:
ArgumentError: Package RCall not found in current path:
- Run `import Pkg; Pkg.add("RCall")` to install the RCall package.
Statistical Analysis
Analysis | Continuous Outcome(Y) | Binary Outcome(Y) |
---|---|---|
Correlation Analysis | ||
X: Continuous | cor.test() | t.test() |
X: Categorical | t.test(), ANOVA() | chisq.test() |
Regression Model | lm() | glm() |