Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The chapter provides instructions and examples of using computing skills for health data and technology research.
Visit other chapters in CODIAC for Health using the Table of Contents or menu in the upper left corner.
Programming languages are written using text editor applications. These applications allow users to create and edit free text, which can then be run as programs. Text editors differ in complexity, some including extra functionality for easier, more efficient programming. Text editors with auto-complete suggest common functions or existing variables as the programmer begins to type, which the programmer can then select without needing to finish typing. Some text editors offer options to run individual lines of code or entire programs while editing files.
Available for Mac, Windows, and Linux operating systems
Includes support for debugging, syntax highlighting, auto-complete, and additional user-friendly functionality
Web application text editor, no download necessary
Includes options for interactive output (HTML, images, videos, LaTeX, and custom MIME types), support for big data tools, such as Apache Spark, and options for sharing notebooks with others
Run individual lines of code or entire programs at once
Highly configurable
Included in most UNIX operating systems (e.g., Linux, or MacOS), no download necessary
Write files from the Terminal
Highly configurable
Included in most UNIX operating systems (e.g., Linux, or MacOS), no download necessary, also available for Windows
Wide range of built-in features for text editing, such as syntax highlighting, automatic indentation, and search and replace
Included in most UNIX operating systems (e.g., Linux, or MacOS), no download necessary
Most of the editing commands are displayed at the bottom of the editing screen for easy reference
Unix is a family of operating systems officially trademarked as UNIX®. These operating systems are computing environments that are optimized for multi-tasking across multiple users. The original system was developed by AT&T in 1969 as a text only system. There are many Unix variants or Unix-like systems (e.g. GNU/Linux, Sun Solaris, IBM AIX, and Mac OS X). On Windows, Cygwin is a program that provides a Unix-like environment.
The main components of a Unix operating system include:
Kernel – bridge between hardware (i.e. silicon) and application (i.e. software)
Shell – command line interface to enable user interaction with the system
File System – the organization structure for how files are stored
The Unix file system organizes files and directories into a hierarchical structure like the root system of a tree.
The "root" directory (e.g. "/") is the top of the hierarchy.
Standard directories within the root directory:
/bin
and /usr
contain commands needed by system administrators and users
/etc
contains system-wide configuration files and system databases
/home
contains the home directory (~
) for each user (In some systems, the home directories may be in a different location such as /users
or /Users
)
When traversing directories
working directory (.) is the directory that a user currently is in
parent directory (..) is the directory above the working directory
path or pathname specifies where a user is in the file system
full path or absolute path points to the same location regardless of the working directory (i.e., it is written in reference to the root directory)
relative path is the path relative to the working directory
If the working directory is the home
directory for bcbi
, the full path for the course
directory is /home/bcbi/course
while the relative path is just course
. A schematic of this is below:
If code
then becomes the working directory, the full path for the data directory from there is /home/bcbi/course/data
while the relative path is ../data
. A schematic of this is below:
The Unix shell provides a command line interface for interacting with the operating system and is where commands are entered. An example below is a Mac OS X Terminal Shell logged into a RedHat Linux Server as user_name
.
The prompt may look different depending on your shell (e.g., Bourne shell [sh], C shell [csh], or Bourne-Again shell [bash])
Default prompts include $
and %
The prompt #
typically appears when logged in as the superuser
or root user
who can do anything on the system, so should be restricted to trusted users, used only when necessary and with caution. While you may be able to do this on a system you control, you are unlikely to ever have root priviledges on a shared computing resource (e.g. Oscar or Stronghold at Brown University)
The prompt can be configured to include additional information such as hostname, username, and pathname (e.g., computer:/home/bcbi/course bcbi $
).
There are many Unix commands. Some commands will display output and then return to the shell prompt while others will just return to the shell prompt to indicate that it has executed the last command.
Unix command syntax:
Case-sensitive (pwd ≠ PWD)
May involve one or more arguments
Argument may be an option (or flag or switch) for that command
Argument may be a file or directory
To get to a Unix shell on your computer:
For Mac, launch the Terminal application (under Applications → Utilities → Terminal)
For Linux, launch the Terminal application
For Windows, launch the PowerShell application
Get help from manual (man) pages on commands: (Use spacebar or up and down arrows to scroll through pages and then press q
to quit)
Determine what directory you are currently in with pwd (present working directory):
Get a listing of current directory contents using ls:
Create course directory using mkdir: (Replace course with class name - e.g., methods2020 or biol6535)
Get a listing of current directory contents with details using ls:
Change into course directory using cd: (Replace course with class name - e.g., methods2020 or biol6535)
ls
directory listing (remotely in sftp)
lls
local directory listing (sftp only)
ls -al
formatted listing with hidden files
cd dir
change directory to dir (remotely in sftp)
lcd dir
change local directory to dir (sftp only)
cd
change to home (remotely in sftp)
pwd
show current directory (remote directory in sftp)
lpwd
show current local directory
mkdir dir
create a directory dir
rm file
delete file
rm -r dir
delete directory dir
rm -f file
force remove file
rm -rf dir
force remove directory dir *
cp file1 file2
copy file1 to file2
cp -r dir1 dir2
copy dir1 to dir2; create dir2 if it doesn't exist
mv file1 file2
rename or move file1 to file2 if file2 is an existing n directory, moves file1 into directory file2
put file
copy local file to current remote directory (sftp only)
get file
copy remote file to current local directory (sftp only)
date
show the current date and time
cal
show this month's calendar
uptime
show current uptime
who
display who is online
whoami
who you are logged in as
wc
counts the number of lines, words, bytes in file
wc -l
counts the number of lines in file
cut -f1 file
cut out selected portions (first head ) of each line of a tab-delimited file
cut -d| -f1,2,3 file
cut out columns 1,2, and 3 from a pipe-delimited file
sort file
sort lines of text file file
uniq file
report or filter out repeated lines in a file
grep pattern files
search for pattern in files
grep -v pattern files
search for lines that do not contain pattern in files
awk pattern file
manipulate data and generate reports
sed pattern file
text stream editor
Ctrl+A
go to beginning of current command
Ctrl+E
go to end of current command
Ctrl+C
halts the current command
Ctrl+Z
stops the current command, resume with fg in the foreground or bg in the background
Ctrl+D
log out of current session, similar to exit
Ctrl+W
erases one word in the current line
Ctrl+U
erases the whole line
Ctrl+R
type to bring up a recent command
!!
repeats the last command
exit
log out of current session
less file
displays file contents one screen at a time (similar to more
but enables mouse scrolling because less is more)
head file
displays the first few lines of a file.
tail file
displays the last few lines of a file.
chmod octal file
change the permissions (in either a ssh or sftp session) of file to octal, which can be found separately for user, group, and world by adding:
4
read (r)
2
write (w)
1
execute (x)
Analyze the MIMIC-IV Demo Files Using Unix Commands - Forthcoming!
Analyze the SyntheticRI Demo Files Using Unix - Forthcoming!
Brown CCV: Quick Reference / Common Linux Commands
All major operating systems organize files into hierarchical directories. Understanding these file directory structures is vital when interacting with data files using Unix commands or a programming language.
This page describes file directory structures generally as well as some of the differences between file directory structures within different operating systems.
Directories allow users to group files into an organized structure. They are typically visualized like root systems of trees, the highest level of which is called the "root directory". Subdirectories branch down from the root directory, containing files as well as additional subdirectories.
Directories and files are typically described using the path used to reach them through the directory structure, starting with the root directory. In Linux and Mac operating systems, the root directory is indicated as "/" (In Windows OS, the root directory is indicated as "\"). An additional "/" (or "\" for Windows OS) is placed between each object in the path.
For example, looking at Figure 1, File_B1a2 could be described with:
/Directory_B/Directory_B1/Directory_B1a/File_B1a2
All major operating systems also provide users with a graphical user interface, or GUI (often pronounced "gooey"), which allows interaction with software and files through visual icons. If you are not already familiar with accessing files and directories through the command line, you are likely familiar with using a GUI file system. While not the recommended method for interacting with files while programming, the GUI file system can be a useful tool for visualizing a directory structure.
Figure 2 displays the GUI file system for a computer running MacOS. Though the GUI directory structure is visualized horizontally, the "root system" is still clearly visible. Using its complete path, the file "medication_data" should be described as:
/Users/<username>/Documents/project_a/data_files/medication_data
Instructions for installing Julia on macOS and Windows operating systems can be found here.
Package managers such as Homebrew (macOS and Linux) and Chocolatey (Windows) can be used to facilitate installation.
For most users, it is recommended to download the current stable release from https://julialang.org/downloads/.
Some developers might wish to use a different version, or to switch between versions. For this, the Juliaup version manager can be useful.
Julia is also available for use in Brown's Computing Environments:
Oscar (for high-performance computing)
Stronghold (for secure computing)
Julia is an open source dynamic programming language for high-level, high-performance numerical computing [1]. Julia provides ease and expressiveness (similar to R, MATLAB, and Python), but also supports general programming [2].
Development of Julia began in 2009, and the first version was released in February 2012. The current version of Julia is 1.11 (as of November 2024).
Learn X in Y Minutes: X=Julia
GitHub is a code hosting platform that allows developers to create, store, manage, and share their code. It uses Git software, providing the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous integration, and wikis for every project. Refer to GitHub Docs for additional GitHub documentation and tutorials.
Like other cloud platforms (e.g., Google Docs), GitHub allows users to work on projects together. Please note, code changes must be manually saved. GitHub does not automatically save your work. To save changes, open the Terminal application, navigate to the cloned repository, and run the following commands, replacing "INSERT PROGRESS NOTE" with brief description of changes.
git add -A
: adds all your code changes to the GitHub repository
git commit -m"INSERT PROGRESS NOTE"
: adds a note to the commit which you and your team can reference later. This note should be brief and informative, describing the purpose of your code changes.
git push
: saves your code changes to the GitHub repository.
If multiple users are pushing code changes to your GitHub repository, make sure to retrieve or "pull" these edits before you begin making code changes. To do so, open the Terminal application, navigate to the cloned repository, and run the following command. If you have made any code changes, you will need to save them first for the pull to work.
When your are making code changes, you should git pull
before making any edits. This will keep your team from encountering "merge conflicts", which can become difficult to troubleshoot. To mitigate merge conflicts, make sure to communicate with your team. Inform your team whenever you push new code changes so that everyone is always working one the most updated version of the code.
Merge conflicts happen when you attempt to merge code branches that have competing commits. They are often caused by users making code changes without pulling first. To resolve a merge conflict, work through the following steps:
Identify the location of the merge conflict.
Manually edit the conflicted file from a single machine, selecting the changes you want to keep in the final merge.
Push the selected changes to GitHub.
All team members should pull the corrected changes from GitHub before continuing to make code changes.
This page provides syntax for using numbers and mathematic operations in Julia. Each section includes an example to demonstrate the described syntax and operations.
Integer (positive and negative counting number) - e.g., -3, -2, -1, 0, 1, 2, and 3
Signed: Int8, Int16, Int32, Int64, and Int128
Unsigned: UInt8, UInt16, UInt32, UInt64, and UInt128
Boolean: Bool
(0 = False and 1 = True)
Float (real or floating point numbers) - e.g., -2.14, 0.0, and 3.777
Float16, Float32, Float64
Use typeof()
function to determine type
Input:
Output:
Addition
x + y
Subtraction
x - y
Multiplication
x * y
Division
x / y
Power (Exponent)
x ^ y
Remainder (Modulo)
x % y
Negation (for Bool)
!x
Input:
Output:
Input:
Equality
x == y or isequal(x, y)
Inequality
x != y or !isequal (x, y)
Less than
x < y
Less than or equal to
x <= y
Greater than
x > y
Greater than or equal to
x >= y
Output:
Create a Health Calculator Using Julia - Forthcoming!
Julia Documentation: Integers and Floating Point Numbers
Julia Documentation: Mathematical Operations and Elementary Functions
Julia Documentation: Numbers
Julia Documentation: Mathematics
Think Julia: Chapter 1 - The Way of the Program
Regular expressions (regex) are powerful tools for pattern matching and text processing. They are represented as a pattern that consists of a special set of characters to search for in a string
str
.
This page provides syntax for regular expressions in Julia . Each section includes an example to demonstrate the described methods.
Check if regex matches a string
occursin(r"pattern", str)
Capture regex matches
match(r"pattern", str)
Specify alternative regex
pattern1|pattern2
Character class specifies a list of characters to match ([...]
where ...
represents the list) or not match ([^...]
)
Character Class
...
Any lowercase vowel
\[aeiou]
Any digit
[0-9]
Any lowercase letter
[a-z]
Any uppercase letter
[A-Z]
Any digit, lowercase letter, or uppercase letter
[a-zA-Z0-9]
Anything except a lowercase vowel
[^aeiou]
Anything except a digit
[^0-9]
Anything except a space
[^ ]
Any character
.
Any word character (equivalent to [a-zA-Z0-9_]
)
\w
Any non-word character (equivalent to [^a-zA-Z0-9_]
)
W
A digit character (equivalent to [0-9]
)
\d
Any non-digit character (equivalent to [^0-9]
)
\D
Any whitespace character (equivalent to [\t\r\n\f]
)
\s
Any non-whitespace character (equivalent to [^\t\r\n\f]
)
\S
Anchors are special characters that can be used to match a pattern at a specified position
Beginning of line
^
End of line
$
Beginning of string
\A
End of string
\Z
Repetition or quantifier characters specify the number of times to match a particular character or set of characters
Zero or more times
*
One or more times
+
Zero or one time
?
Exactly n times
{n}
n or more times
{n,}
m or less times
{,m}
At least n and at most m times
{n.m}
Input:
Output:
Julia Documentation: Manual - Strings (see Regular Expressions)
Think Julia: Chapter 8 - Strings
This is the typical first program for those new to a general purpose programming language like Julia. It can be used to test that the Installation of Julia is working and also introduce Julia's basic syntax using the REPL environment or running code written using a Text Editor at the Unix command line.
Input:
Output:
Here are variations of the "Hello, World!" programming using variables and different print statements.
Input:
Output:
In order to assign variables in Julia, you write the desired name for your variable, an =
sign, and what the value of the variable should be.
Input:
Output:
We can write comments on our code, which do not run, to describe what certain lines of code or section of code do
These comments are just for the programmer, they will not appear anywhere in the output and just are there to explain what the code is doing or to provide helpful notes
To make a comment in Julia, you can use the “#” symbol and then type your comment
Sometimes you might want to write longer comments that span multiple lines – to do this you can surround these comments with #=
above the start as well as =#
below the end
Input:
Output:
Without using a print statement, Julia will only print out the most recent item that has an output. In order to print multiple things, we can use the print()
or println()
functions.
Input:
Output:
Use Julia in Brown Oscar Computing Environment - Forthcoming!
Use Julia in Brown Stronghold Computing Environment - Forthcoming!
Julia Documentation: Variables
Julia Documentation: Scope of Variables
Think Julia: Chapter 1 - The Way of the Program
Think Julia: Chapter 2 - Variables, Expressions and Statements
Julia comes with a full-featured interactive command-line REPL (read-eval-print loop) built into the
julia
executable. In addition to allowing quick and easy evaluation of Julia statements, it has a searchable history, tab-completion, many helpful keybindings, and dedicated help?
and shell modes;
. [1]
This page provides examples of using REPL on the command line.
Type julia
in terminal to launch REPL
Type "?" to enter help pages within REPL
Type a function from Julia to read help pages (ex: println
)
Julia Contributors. (n.d.). REPL - Standard Library - Julia Language. Retrieved May 1, 2024, from https://docs.julialang.org/en/v1/stdlib/REPL/
Julia Documentation: The Julia REPL
Julia Cheat Sheet (see REPL)
In computer programming, a package is a collection of modules or programs that are often published as tools for a range of common use cases, such as text processing and doing math. Programmers can install these packages and take advantage of their functionality within their own code.
This page provides instructions for installing, using, and troubleshooting packages in Julia.
Start Julia REPL by typing the following in Terminal or PowerShell (Note: do not need to type $ - this is to indicate the shell prompt)
Go into REPL mode for Pkg, Julia’s built in package manager, by pressing ]
Update package repository in Pkg REPL
Add packages in Pkg REPL
Check installation
Get back to the Julia REPL and exit by pressing backspace or ^C.
To see REPL history
If you get an error like: ERROR: SystemError: opening file "C:\\Users\\User\\.julia\\registries\\General\\Registry.toml"
: No such file or directory
Delete C:\\Users\\User\\.julia\\registries
where User is your computer’s username and try again
https://discourse.julialang.org/t/registry-toml-missing/24152
This page provides syntax for strings and characters in Julia as well as some of their associated functions. Each section includes an example to demonstrate the described syntax or function.
Char
is a single character
String
is a sequence of one or more characters (index values start at 1
)
Use typeof()
function to determine type
Input:
Output:
Many Julia programs involve the input and output of files. When analyzing a dataset, that dataset file will need to be pulled into your program (input). If you want to see the results of your analysis, your program will need an output.
This section provides the syntax for inputing files (reading) and outputting results (writing) use base Julia (i.e., no packages such as CSV.jl).
Tabulate and report counts for sex in from the .
Dataset (example lines from adult.data
)
Input (process_file.jl
)
Output
Terminal
Analyze the MIMIC-IV Demo Files Using Julia - Forthcoming!
Analyze the SyntheticRI Demo Files Using Julia - Forthcoming!
In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated.
This page provides syntax for some of the common control flow methods in Julia . Each section includes an example to demonstrate the described methods.
Test if a specified expression is true or false
Short-circuit evaluation
Test if all of the conditions are true x && y
Test if any of the conditions are true x || y
Test if a condition is not true !z
Conditional evaluation
if
statement
if-else
if-elseif-else
?:
(ternary operator)
Input:
Output:
Repeat a block of code a specified number of times or until some condition is met.
while
loop
for
loop
Use break
to terminate loop
Input:
Output:
Input:
Output:
In computer programming, a collection is a grouping of some variable number of data items (possibly zero) that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion.
This page provides syntax for different types of collections and data structures in Julia (arrays, sets, dictionaries, etc.). Each section includes an example to demonstrate the described methods.
Arrays are ordered collection of elements. In Julia
they are automatically indexed (consecutively numbered) by an integer starting with 1.
Input:
Output:
Sets are an unordered collection of unique elements.
Input:
Output:
Dictionaries are unordered collection of key-value pairs where the key serves as the index (“associative collection”). Similar to elements of a set, keys are always unique.
Input:
Output:
DataFrames.jl is a Julia package that provides a set of tools for working with tabular data in Julia. Its design and functionality are similar to those of (in Python) and
data.frame
, and (in R), making it a great general purpose data science tool.
This page provides examples of using DataFrames.jl, demonstrating the syntax and common functions within the package.
Install and Load DataFrames.jl Package
Create Dataframe
Display Dataframe
Input:
Output:
First two lines of dataframe:
Input:
Output:
Last two lines of dataframe:
Input:
Output:
Describe Dataframe
Dataframe size:
Input:
Output:
Dataframe column names:
Input:
Output:
Dataframe description:
Input:
Output:
Accessing DataFrames
Get "age" column (different ways to call the column)
Input:
Output:
Get row
Input:
Output:
Get element
Input:
Output:
Get subset (specific rows and all columns)
Input:
Output:
Get subset (all rows and specific columns)
Input:
Output:
Get subset (all rows meeting specified criteria - numbers)
Input:
Output:
Get subset (all rows meeting specified criteria - strings)
Input:
Output:
Get subset (all rows meeting specified criteria)
Input:
Output:
Add Column
New columns with specified values
Input:
Output:
New column with calculated value
Input:
Output:
Get counts/frequency
Input:
Output:
Transform DataFrame
sort
Input:
Output:
stack (reshape from wide to long format)
Input:
Output:
unstack (reshape from long to wide format)
Input:
Output:
Traversing DataFrame (for loops)
sort
Input:
Output:
Analyzing Health Datasets with DataFrames in Julia - Forthcoming!
and organizations (focused on Julia packages for health and life sciences)
Julia Package:
Julia Package:
Julia Documentation:
Julia Documentation:
Think Julia:
Julia Documentation:
Think Julia:
Wikipedia contributors. (n.d.). Control flow. In Wikipedia. Retrieved May 1, 2024, from
Julia Documentation:
Think Julia:
Think Julia:
Wikipedia contributors (n.d.). Collection. In Wikipedia. Retrieved May 1, 2024, from
Julia Documentation:
Think Julia:
Think Julia:
Think Julia:
JuliaData Contributors. (n.d.). DataFrames.jl - JuliaData. Retrieved May 1, 2024, from
Julia Package:
Julia Package:
Julia Data Science:
Introducing Julia Wikibook:
Equality
x == y or isequal(x, y)
Inequality
x != y or !isequal (x, y)
Less than
x < y
Less than or equal to
x <= y
Greater than
x > y
Greater than or equal to
x >= y
Add element to end
push!(my_array, str)
Remove element from end
pop!(my_array)
Remove element from beginning
popfirst!(my_array)
Add element to beginning
pushfirst!(my_array, str)
Sort array (will not change array itself)
sort(my_array)
Sort array in place (will change array)
sort!(my_array)
Get unique elements in array
unique(my_array)
Intersection
intersect(my_array, your_array)
Union
union(my_array, you_array)
Convert array to string
join(collect(my_array), str)
New set (empty)
Set[]
Specify type
Set{Int64}
Set with values
Set([1, 2, 3, 4, 5])
Set with values
Set(["a1", "b2", "c3", "b2"])
Get length of set my_set
length(my_set)
Check if value is in set
in(str, my_set)
Add value
push!(my_set, str)
Intersection
intersect(my_set, your_set)
Union
union(my_set, your_set)
Difference
setdiff(my_set, your_set)
New dictionary (empty)
Dict[]
Specify type
Dict{String, Int64}
Dictionary with values
Dict("one" => 1 , "two" => 2, "three" => 3, "four" => 4)
Get value for key in dictionary my_dict
my_dict["one"]
Check if dictionary has key
haskey(my_dict, "one")
Check for key/value pair
in(("one" => 1), my_dict)
Get value and set default
get!(my_dict, "one", 5)<br>get!(my_dict, "five", 5)
Add key/value pair
my_dict["five"] = 5
Delete key/value pair
delete!(my_dict, "four")
Get keys
keys(my_dict)
Get values
values(dict)
Convert keys to array
collect(keys(my_dict))
Convert values to array
collect(values(my_dict))
Sorting keys
sort(collect(keys(my_dict)))
Sorting values
sort(collect(values(my_dict)))
Sort by value (descending) with keys
sort(collect(zip(values(my_dict), keys(my_dict))), rev=true)
Sort by value (ascending) with keys
sort(collect(zip(values(my_dict), keys(my_dict))), rev=false)
Get top n by value (e.g., 3)
sort(collect(zip(values(my_dict), keys(my_dict))), rev=true)[1:3]
New array (empty)
[]
Specify type (integer)
Int64[]
Specify type (string)
String[]
Array with values
[1, 2, 3, 4, 5]
Array with values
["a1", "b2", "c3"]
Array of numbers
collect(1:10)
Split string str
by delimiter into words (e.g., space)
split(str, " ")
Get length of array my_array
length(my_array)
Get first element of array my_array
my_array[1]
Get last element of array my_array
my_array[end]
Get n element of array my_array (e.g., 2)
my_array[2]
Check if element is in array
in(str, my_array)
get word
length
length(word)
extract nth
character from word
word[n]
extract substring nth-mth
character from word
word[n:m]
search for letter
in word
findfirst(isequal(letter), word)
search for subword
in word
occursin(word, subword)
remove record separator from word
(e.g., n
)
chomp(word)
remove last character from word
chop(word)
List of exercises found across the different Julia pages.
Use Julia in Brown Oscar Computing Environment - Forthcoming!
Use Julia in Brown Stronghold Computing Environment - Forthcoming!
Create a Health Calculator Using Julia - Forthcoming!
Create a Pediatric Dosage Calculator Using Julia
Create a BMI Calculator Using Julia
Analyze Health Datasets Using Unix Commands - Forthcoming!
Analyze MIMIC-IV Demo Files Using Unix Commands
Analyze SyntheticRI Demo Files Using Unix
Analyze Health Datasets Using Julia - Forthcoming!
Analyze MIMIC-IV Demo Files Using Julia
Analyze SyntheticRI Demo Files Using Julia
Python is one of the many languages used by the data science community to perform data manipulation, statistical modeling and machine learning. Its design philosophy emphasizes code readability. The python community is huge, offering an enormous library of technical support documentation. If you don't know how to do something in Python, chances are, someone else asked a similar question online and received a comprehensive answer.
This page provides syntax for using numbers and mathematic operations in Python. Each section includes an example to demonstrate the described syntax and operations.
Integer (positive and negative counting number) - e.g., -3, -2, -1, 0, 1, 2, and 3:
int
- holds signed integers of non-limited length
long
- holds long integers (exists in Python 2.X, depreciated in Python 3.X)
Float (real or floating point numbers) - e.g., -2.14, 0.0, and 3.777
float
Boolean: (0 = False and 1 = True)
bool
Use type()
function to determine type
Input:
Output:
Addition
x + y
Subtraction
x - y
Multiplication
x * y
Division
x / y
Floor Division
x//y
Power (Exponent)
x ** y
Remainder (Modulo)
x % y
Input:
Output:
Input:
Equality
x == y or isequal(x, y)
Inequality
x != y or !isequal (x, y)
Less than
x < y
Less than or equal to
x <= y
Greater than
x > y
Greater than or equal to
x >= y
Output:
Create a Health Calculator Using Python - Forthcoming!
W3 Schools: Python Data Types
W3 Schools: Python Arithmetic Operators
W3 Schools: Python Numbers
This is the typical first program for those new to a general purpose programming language like Python. It can be used to test that the Installation of Python is working and also introduce Python's basic syntax using the REPL environment or running code written using a Text Editor at the Unix command line.
Input:
Output:
Here are variations of the "Hello, World!" programming using variables and different print statements.
Input:
Output:
In order to assign variables in Python, you write the desired name for your variable, an “=” sign, and what the value of the variable should be.
Input:
Output:
We can write comments on our code, which do not run, to describe what certain lines of code or section of code do
These comments are just for the programmer, they will not appear anywhere in the output and just are there to explain what the code is doing or to provide helpful notes
To make a comment in Python, you can use the “#” symbol and then type your comment
Sometimes you might want to write longer comments that span multiple lines – to do this you can surround these comments with three tick marks above the start as well as three tick marks below the end
Input:
Output:
Without using a print statement, Python will only print out the most recent item that has an output. In order to print multiple things, we can use the print() function
Input:
Output:
Python is very sensitive with its indentation notation. Indentation should only be used in hierarchical structures, such as a class, function, or loop. Indents in improper locations will cause an error
Input:
Output:
Use Python in Brown Oscar Computing Environment - Forthcoming!
Use Python in Brown Stronghold Computing Environment - Forthcoming!
Instructions for installing Python on macOS and Windows operating systems can be found here.
For most users, it is recommended to download the current stable release from https://www.python.org/downloads/.
Some developers might wish to use a different version, or to switch between versions. For this, the Python version manager can be useful.
Python is also available for use in Brown's Computing Environments:
Oscar (for high-performance computing)
Stronghold (for secure computing)
The following instructions have been tested on computers running macOS 16 Big Ventura. In order to check the macOS version running on your computer, click on the "apple" icon in the top left hand corner of your screen and select "About This Mac." A window will pop up that includes a version number. Confirm you are running at least Version 16.X (where 'X' is any number). These instructions will likely work with earlier versions of macOS as well. If you are not running macOS 11.X Big Sur, you can upgrade for free following the instructions provided on Apple's website.
Download Python
Navigate to https://www.python.org/downloads/ and download the most recent version of Python for macOS.
Install Python
Open the downloaded file (e.g., python-3.12.3-macos11.pkg). A window will pop up with installation instructions. Progress through the prompts until Python has been installed in your Applications folder. Next, double click on the Python folder shortcut in your Applications folder to open it.
Run Python
Open, Terminal, type python3
, and hit return. Python should open. To quit Python, type quit()
and hit return.
Troubleshooting
If you get a Permission denied
error, rerun the command prepended with sudo
. You will be prompted to enter your computer password.
The following instructions have been tested on computers running Windows 10. Confirm that you are running at least Windows 10. These instructions will likely work with earlier versions of Windows, however they have not been tested.
Download Python
Navigate to https://www.python.org/downloads/ and download the most recent version of Python for Windows (32-bit or 64-bit depending on the specifications of your device).
Install Python
Open the downloaded file (e.g., python-3.10.10-amd64.exe). A window will pop up with installation instructions. Progress through the prompts until Python has been installed on your device. When prompted with Advanced Options, make sure to check "Add Python to environment variables".
Run Python
Open Command Prompt, type py
, and hit enter. Python should open to quit Python, type quit()
and hit return.
Python comes with a full-featured interactive command-line REPL (read-eval-print loop) built into the
python
executable. In addition to allowing quick and easy evaluation of Python statements, it has a searchable history, tab-completion, many helpful keybindings, and dedicated help?
and shell modes;
.
This page provides examples of using REPL on the command line
Type python
in terminal to launch REPL
Type "help
" to enter help pages within REPL
Type a function from Python to read help pages (ex:print
)
Press q
to quit
In computer programming, a collection is a grouping of some variable number of data items (possibly zero) that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion. []
This page provides syntax for different types of collections and data structures in Python (arrays, sets, dictionaries, etc.). Each section includes an example to demonstrate the described methods
Arrays are ordered collections of elements. In Python they are automatically indexed (consecutively numbered) by an integer starting with 0.
Input:
Output:
Sets are an unordered collection of unique elements.
Input:
Output:
Dictionaries are unordered collection of key-value pairs where the key serves as the index (“associative collection”). Similar to elements of a set, keys are always unique.
Input:
Output:
In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated.
This page provides syntax for some of the common control flow methods in Python. Each section includes an example to demonstrate the described methods
Test if a specified expression is true or false
Short-circuit evaluation
Test if all of the conditions are true x and y
Test if any of the conditions are true x or y
Test if a condition is not true not z
Conditional evaluation
if
statement
if-else
if-elif-else
Ternary operator
true_value if
condition else
false_value
Input:
Output:
Repeat a block of code a specified number of times or until some condition is met
while
loop
for
loop
Use break
to terminate loop
Input:
Output:
Input:
Output:
Many Python programs involve the input and output of files. When analyzing a dataset, that dataset file will need to be pulled into your program (input). If you want to see the results of your analysis, your program will need an output.
This section provides the syntax for inputting files (reading) and outputting results (writing) using base Python (i.e, no packages such as Pandas)
Tabulate and report counts for sex in from the .
Dataset (example lines from adult.data
)
Input (process_file.py
)
Output
Terminal
Analyze the MIMIC-IV Demo Files Using Julia - Forthcoming!
Analyze the SyntheticRI Demo Files Using Julia - Forthcoming
Regular expressions are powerful tools for pattern matching and text processing. They are represented as a pattern that consists of a special set of characters to search for in a string
str
. The regex module needs to be imported before use.
This page provides syntax for regular expressions in Python . Each section includes an example to demonstrate the described methods.
Character class specifies a list of characters to match ([...]
where ...
represents the list) or not match ([^...]
)
Anchors are special characters that can be used to match a pattern at a specified position
Repetition or quantifier characters specify the number of times to match a particular character or set of characters
Input:
Output:
This page provides syntax for different data types in Python as well as some of their associated functions. Each section includes an example to demonstrate the described syntax or function.
A string is a sequence of one or more characters (index values start at 0)
Input:
Output:
In computer programming, a package is a collection of modules or programs that are often published as tools for a range of common use cases, such as text processing and doing math. Programmers can install these packages and take advantage of their functionality within their own code.
This page provides instructions for installing, using, and troubleshooting packages in Python.
There is a two-step process for using an external package in Python. First, if it is your first time using the package, you must install the package. This only needs to be done once for the environment you are working in, even if you are using different documents or files. Then, you must load the package to your specific document. Let's look at an example using the NumPy package
To install a package, we use the pip
command as follows:
Again note that this only needs to be done once. After you have installed a package you do not need to do so again, you can simply load it
If we want to load an entire package (instead of just certain functions), we can use the import
command as follows:
We import the name of the package and name is as some shorthand name so that we do not need to type the whole package name every time we want to use a function from that package. In order to call a function from an imported package we can use the shorthand name followed by a dot followed by the name of the function. Here is an example:
Some packages will have many different parts, or modules, and we might not want to use all of these modules at once. Importing all of these modules when we don't need them can be an unnecessary waste of computing power, so instead we can only import the functions we need. Let's look at the scikit-learn package for example
We can install this package the same way as above, however we will not import the whole package at once. Instead, we will only import the functions we need from the modules we need. Here is an example of how we can import the train_test_split()
function from the model_selection
module of scikit-learn
(or sklearn
for short)
Wikipedia contributors (n.d.). Collection. In Wikipedia. Retrieved May 1, 2024, from
W3 Schools:
Data Quest:
Python Documentation:
Python Wiki:
W3 Schools:
W3 Schools:
Tutorials Point:
Data Science Central:
W3 Schools:
W3 Schools:
Add element to end
my_array.append(str)
Remove element from end
my_array.pop()
Remove element from beginning
my_array.pop(0)
Add element to beginning
my_array.insert(0, str)
Sort array (will not change array itself)
sorted(my_array)
Sort array in place (will change array)
my_array.sort()
Get unique elements in array
list(set(my_array))
Intersection
set(my_array).intersection(your_array)
Union
set(my_array).union(your_array)
New set (empty)
[]
Set with values
my_set = {1, 2, 3, 4, 5}
Set with values
my_set = {"a1", "b2", "c3"}
Get length of set my_set
len(my_set)
Check if value is in set
"str" in my_set
Add value
my_set.add("str")
Intersection
my_set.intersection(your_set)
Union
my_set.union(your_set)
Difference
my_set.difference(your_set)
New Dictionary (empty)
{}
Dictionary with values
{"one": 1, "two": 2, "three": 3, "four": 4}
Get value for key in dictionary my_dict
my_dict["one"]
Check if dictionary has key
"one" in my_dict
Check for key/value pair
("one", 1) in my_dict.items()
Get value and set default
my_dict.get("one", 5)
my_dict.setdefault("five", 5)
Add key/value pair
my_dict["five"] = 5
Delete key/value pair
my_dict.pop("four", None)
Get keys
my_dict.keys()
Get values
my_dict.values()
Convert keys to array
list(my_dict.keys())
Convert values to array
list(my_dict.values())
Sorting keys
sorted(my_dict.keys())
Sorting values
sorted(my_dict.values())
Sort by value (descending) with keys
sorted(my_dict.items(), key=lambda x: x[1], reverse=True)
Sort by value (ascending) with keys
sorted(my_dict.items(), key=lambda x: x[1])
Get top n by value (e.g., 3)
sorted(my_dict.items(), key=lambda x: x[1], reverse=True)[:3]
Equality
x == y
Inequality
x != y
Less than
x < y
Less than or equal to
x <= y
Greater than
x > y
Greater than or equal to
x >= y
Check if regex matches a string
re.search("pattern", string, flag=0)
Capture regex matches
re.match("pattern", string, flag=0)
Specify alternative regex
pattern1|pattern2
Character Class
...
Any lowercase vowel
[aeiou]
Any digit
[0-9]
Any lowercase letter
[a-z]
Any uppercase letter
[A-Z]
Any digit, lowercase letter, or uppercase letter
[a-zA-Z0-9]
Anything except a lowercase vowel
[^aeiou]
Anything except a digit
[^0-9]
Anything except a space
[^ ]
Any character
.
Any word character (equivalent to [a-zA-Z0-9_]
)
\w
Any non-word character (equivalent to [^a-zA-Z0-9_]
)
W
A digit character (equivalent to [0-9]
)
\d
Any non-digit character (equivalent to [^0-9]
)
\D
Any whitespace character (equivalent to [\t\r\n\f]
)
\s
Any non-whitespace character (equivalent to [^\t\r\n\f]
)
\S
Beginning of line
^
End of line
$
Beginning of string
\A
End of string
\Z
Zero or more times
*
One or more times
+
Zero or one time
?
Exactly n times
{n}
n or more times
{n,}
m or less times
{,m}
At least n and at most m times
{n.m}
New array (empty)
[]
Array with values (integers)
[1, 2, 3, 4, 5]
Array with values (string)
[“a1”, “ab2”, “c3”]
Array of numbers
list(range(1, 11))
Split string str by delimiter into words (e.g., space)
str.split(“ “)
Get length of array my_array
len(my_array)
Get first element of array my_array
my_array[0]
Get last element of array my_array
my_array[-1]
Get nth element of array my_array
(e.g., 2)
my_array[1]
Check if element is in array
str in my_array
get word length
len("abc")
extract nth character from word
"abc"[n]
extract substring nth-mth character from word
"abc"[n:m]
search for character in word
"abc".index("character")
search for subword in word
"ab" in "abc"
remove white spaces from the end of a word
"abc ".strip()
remove last character from word
"abc"[:-1]
determine data structure type
type("abc")
R is one of the many languages used by the data science community to perform data manipulation, statistical modeling and machine learning. R was designed by statisticians for statistical computing.
For most users, it is recommended to download the current stable release from https://cloud.r-project.org/.
Some developers might wish to use a different version, or to switch between versions. For this, the rvenv package can be useful.
R is also available for use in Brown's Computing Environments:
Oscar (for high-performance computing)
Stronghold (for secure computing)
Download and install the latest version of The R Project for Statistical computing for macOS here.
For an integrated development environment (IDE) / graphical interface, you can also download and install R Studio from here.
This page provides examples of using the pandas package in Python, demonstrating the syntax and common functions within the package.
Install and Load Pandas
Create Dataframe
Display Dataframe
Input:
Output:
First two lines of dataframe:
Input:
Output:
Last two lines of dataframe:
Input:
Output:
Describe Dataframe
Dataframe size:
Input:
Output:
Dataframe column names:
Input:
Output:
Dataframe description:
Input:
Output:
Accessing DataFrames
Get "age" column (different ways to call the column)
Input:
Output:
Get row
Input:
Output:
Get element
Input:
Output:
Get subset (specific rows and all columns)
Input:
Output:
Get subset (all rows and specific columns)
Input:
Output:
Get subset (all rows meeting specified criteria - numbers)
Input:
Output:
Get subset (all rows meeting specified criteria - strings)
Input:
Output:
Get subset (all rows meeting specified criteria)
Input:
Output:
Add Column
New columns with specified values
Input:
Output:
New column with calculated value
Input:
Output:
Get counts/frequency
Input:
Output:
Transform DataFrame
sort
Input:
Output:
stack (reshape from wide to long format)
Input:
Output:
unstack (reshape from long to wide format)
Input:
Output:
Traversing DataFrame (for loops)
sort
Input:
Output:
Analyzing Health Datasets with Pandas in Python- Forthcoming!
Python Pandas: Pandas Documentation
W3 Schools: Pandas Tutorial
Geeks for Geeks: Pandas Introduction
This is the typical first program for those new to a programming language. It can be used to test that the Installation of R is working and also introduce R's basic syntax using the REPL environment or running code written using a Text Editor at the Unix command line.
<- or = or <<-
Left Assignment
x <- 7, x = 7, x <<- 7
-> or ->>
Right Assignment
x -> 7, x ->> 7
Logical
TRUE, FALSE
Numeric
1, 55, 999
Integer
1L, 32L, 0L
Complex
2 + 3i
Character
"great", "23.4"
Unlike other languages, R does not require the use of print statements to output code, but it does allow them. To print, you can simply write code, or include the code you want to be printed in a print() statement.
We can write comments on our code, which do not run, to describe what certain lines of code or section of code do. These comments are just for the programmer- they will not appear anywhere in the output and simply explain what the code is doing or provide helpful notes.
To comment in R, use the “#” symbol and type your comment on the same line
R has no syntax for multi-line comments, so each line that is commented out needs a "#" symbol at the beginning
R Documentation: Vectors and Assignment
R Documentation: Comments
Lists in R are ordered collections of data that can be of different classes.
New list (empty)
listname <- list()
New list (misc)
listname <- list(1L, "abc", 10.3)
Access an element
list[position]
Change a value
list[position] <- newvalue
See number of values in a list
length(list)
See if item is present in a list
item %in% list
Add item to a list
append(list)
Add item to a list at a specific position
append(list, after=index number)
Remove item from list
newlist <- list[-index number]
New matrix (empty)
matrixname <- matrix()
New matrix (numbers)
matrixname <- matrix(data, nrow=, ncol=)
New matrix (strings)
matrixname <- matrix(data, nrow=, ncol=)
Access a matrix element
matrix[row position, column position]
Access an entire row
matrix[row position,]
Access an entire column
matrix[,column position]
Create an additional row
rbind(matrix, values for new row)
Create an additional column
cbind(matrix, values for new column)
New array (empty)
arrayname <- array()
New array (numbers)
arrayname <- array(data, dim(nrow=, ncol=, ndim=)
New array (strings
arrayname <- array(data, dim(nrow=, ncol=, ndim=)
Access an array element
array[row position, column position, dimension]
Check if an item exists
value %in% array
Sort array increasing
sort(array)
Sort array decreasing
sort(array, decreasing = TRUE)
Search for a substring within a string
grep(substring/value, string)
Replace a single value within a string
sub(pattern, replacement, string)
Replace all instances within a string
gsub(pattern, replacement, string)
Find matches for exact string
grepl(pattern, string)
DataCamp: Regular Expression
Get string length
nchar(string)
Combine two strings
str_c(string1, string2)
Sort values within a string
sort(string1, string2, string3)
R for Data Science: String Functions
Addition
+
Subtraction
-
Multiplication
*
Division
/
Power (Exponent)
^ or **
Remainder (Modulo)
%%
Negation (for Bool)
!x
>
Greater than
<
Less than
>=
Greater than or equal
<=
Less than or equal
==
Exactly equal
!=
Not equal to
&
Entry wise and
R Documentation: Arithmetic
R Documentation: Logical Operators
Used to test if a specific case is true or false
Short-circuit evaluation:
Test if all conditions are true
Test if any conditions are true
Test if a condition is not true
If statement: run code if this statement is true
Only used at the beginning of a conditional statement
Else if statement: if previous statements aren't true, try this
Can be used an unlimited number of times in an if statement
Else statement: catch-all for anything outside of prior statements
Only used to end a conditional statement
Repeats a block of code a specified number of times or until some condition is met
While loop
For loop
Use break to terminate loop
>
Greater than
<
Less than
>=
Greater than or equal
<=
Less than or equal
==
Exactly equal
!=
Not equal to
&
Entry wise and
Input:
Output:
R Documentation: Conditional Execution
R Documentation: Repetitive Execution
R comes with a full-featured interactive command-line REPL (read-eval-print loop) built into the
R
executable. In addition to allowing quick and easy evaluation of R statements, it has a searchable history, tab-completion, many helpful keybindings, and dedicated help?
and shell modes;
.
This page provides examples of using REPL on the command line.
Type "module load r" in terminal to load the R module, then on a new line type "R" to launch R
In terminal, q() quits the R module
Type "?" or help(function) to enter help pages within R's REPL
For example, to ask for help with linear functions in R, use help(lm) (output shown below)
When coding in R, you will often need to input datasets to work with! The easiest ways to do so are either from a .csv file or a .txt file. To do this, you can use the read.csv() and read_table() functions, respectively. The following demonstrates these functions using a hypothetical "hospital_data" dataset.
To output a file from R, use the syntax sink("FileName.FileType").
In computer programming, a package is a collection of modules or programs that are often published as tools for a range of common use cases, such as text processing and doing math. Programmers can install these packages and take advantage of their functionality within their own code.
This page includes instructions for installing packages in R and a description of some of R's most frequently used packages.
To install a package in R, you can either:
Use the install.packages("PackageName") function if you have the package downloaded locally on your machine
Or if you are using RStudio, you can use Tools > Install packages, enter in the package name and click Install
Once you install the package, you have to load it into your library using the libary(PackageName) function.
In R, tidyverse is one of the most popular packages, as it contains an assortment of packages used for data science, such as:
, used to create graphics and data visualization
, contains functions used for data manipulation, like mutate() and filter()
, used for data organization and cleaning
, an optimized dataframe visualizer
, can be used to input Excel files in .xlsx format into R
data.frame
, and the package provide a set of tools for working with tabular data in R. Their design and functionality are similar to those of DataFrames.jl (in Julia) and (in Python), making them great general purpose data science tools.
This page provides examples of using data.frame, data.table, and dplyr, demonstrating the syntax and common functions within the tools.
Installing data.frame, data.table, and dplyr in R.
The data.frame package comes preloaded into R, and the dplyr package is part of the tidyverse package (see section for tidyverse installation instructions). To install data.table, use install.packages('data.table').
This example will take place using data.frame as it is does not require additional packages- see resources at the bottom of this page for additional information on data.table and dplyr.
Create DataFrame
Display DataFrame
Input:
Output:
Print first two lines of DataFrame
Input:
Output:
Print last two lines of DataFrame
Input:
Output:
Describe DataFrame
DataFrame size:
Input:
Output:
DataFrame column names:
Input:
Output:
DataFrame description:
Input:
Output:
Accessing DataFrames
Get "age" column (different ways to call the column)
Input:
Output:
Get row
Input:
Output:
Get element
Input:
Output:
Get subset (specific rows and all columns)
Input:
Output:
Get subset (all rows and specific columns)
Input:
Output:
Get subset (all rows meeting specified criteria - numbers)
Input:
Output:
Get subset (all rows meeting specified criteria - strings)
Input:
Output:
Get subset (all rows meeting specified criteria)
Input:
Output:
Add Column
New columns with specified values
Input:
Output:
New column with calculated value
Input:
Output:
Get counts/frequency
Input:
Output:
Transform DataFrame
sort
Input:
Output:
stack (reshape from wide to long format)
Input:
Output:
unstack (reshape from long to wide format)
Input:
Output:
Traversing DataFrame (for loops)
sort
Input:
Output:
When performing functions such as sorting or transformation, using a package like data.table or dplyr will typically be easier than using base R (data.table), as those packages include commands designed for DataFrame manipulation. This guide uses base R for the sake of continuity.
This page will go over much of the same content as the R page, but using tidyverse's dplyr and tidyr packages rather than base R. You may notice that pipes (%>%) are used more often here. Pipes are functionally the same as other elements like summary() or $, but tend to be the predominant syntax for more advanced uses of R, particularly in the tidyverse, as they can help chain multiple operations in the same line of code.
In order to use the tidyverse modules, they first have to be installed. Ensure that the following code is at the top of your coding environment:
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
R Documentation:
More read.csv resources
R Documentation:
R Documentation:
R Documentation:
R Documentation:
Tidyverse: