File Input/Output

Many Python programs involve the input and output of files. When analyzing a dataset, that dataset file will need to be pulled into your program (input). If you want to see the results of your analysis, your program will need an output.

This section provides the syntax for inputting files (reading) and outputting results (writing) using base Python (i.e, no packages such as Pandas)

UC Irvine Machine Learning Repository: Adult Data Set

Tabulate and report counts for sex in Adult Data Set from the UC Irvine Machine Learning Repository.

Dataset (example lines from adult.data)

39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K
38, Private, 215646, HS-grad, 9, Divorced, Handlers-cleaners, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K
53, Private, 234721, 11th, 7, Married-civ-spouse, Handlers-cleaners, Husband, Black, Male, 0, 0, 40, United-States, <=50K
28, Private, 338409, Bachelors, 13, Married-civ-spouse, Prof-specialty, Wife, Black, Female, 0, 0, 40, Cuba, <=50K

Input (process_file.py)

# process_file.py
# Tabulate and report counts for sex in Adult Data Set
# https://archive.ics.uci.edu/ml/datasets/adult

# relative path of file
data_file = open("_data/adult/adult.data", "r")

# absolute path of file
# data_file = open("/Users/user/data/adult/adult.data", "r")

# initialize collection (dictionary for tabulating counts)
gender_dict = {}

# read each line, extract sex, and keep track of counts
for line in data_file:

    # skip empty lines
    if not line.strip():
        continue

    # split line into array, based on delimiter (comma and space)
    line_array = line.strip().split(", ")

    # tabulate the counts for gender
    gender = line_array[9]  # Adjusted index to 9 (Python is 0-indexed)
    if gender in gender_dict:
        gender_dict[gender] += 1
    else:
        gender_dict[gender] = 1

# close the input file
data_file.close()

# report total counts
print("Sort by key (alphabetical):")
for gender in sorted(gender_dict.keys()):
    print(f"  {gender} = {gender_dict[gender]}")

# report total counts by key, in reverse order
print("Sort by key (reverse alphabetical):")
for gender in sorted(gender_dict.keys(), reverse=True):
    print(f"  {gender} = {gender_dict[gender]}")

# report total counts by value, in reverse order (send output to file)
with open("process_file_output.txt", "w") as output_file:
    print("Sort by value (reverse numerical):")
    for gender, count in sorted(gender_dict.items(), key=lambda item: item[1], reverse=True):
        print(f"  {gender} = {count}")
        output_file.write(f"{gender} = {count}\n")

Output

Sort by key (alphabetical):
  Female = 10771
  Male = 21790
Sort by key (reverse alphabetical):
  Male = 21790
  Female = 10771
Sort by value (reverse numerical):
  Male = 21790
  Female = 10771

Terminal

$ python process_file.py
Sort by key (alphabetical):
  Female = 10771
  Male = 21790
Sort by key (reverse alphabetical):
  Male = 21790
  Female = 10771
Sort by value (reverse numerical):
  Male = 21790
  Female = 10771

$ ls -1
process_file.py
process_file_output.txt

$ cat process_file_output.txt
Male = 21790
Female = 10771

Exercises

Analyze the MIMIC-IV Demo Files Using Julia - Forthcoming!
Analyze the SyntheticRI Demo Files Using Julia - Forthcoming

Resources

Tutorials Point: Python - Files I/O
Data Science Central: Python File Input/Output

PreviousCollections and Data Structures NextPackages

Last updated 9 months ago