File Input/Output
UC Irvine Machine Learning Repository: Adult Data Set
39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K
38, Private, 215646, HS-grad, 9, Divorced, Handlers-cleaners, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K
53, Private, 234721, 11th, 7, Married-civ-spouse, Handlers-cleaners, Husband, Black, Male, 0, 0, 40, United-States, <=50K
28, Private, 338409, Bachelors, 13, Married-civ-spouse, Prof-specialty, Wife, Black, Female, 0, 0, 40, Cuba, <=50K# process_file.py
# Tabulate and report counts for sex in Adult Data Set
# https://archive.ics.uci.edu/ml/datasets/adult
# relative path of file
data_file = open("_data/adult/adult.data", "r")
# absolute path of file
# data_file = open("/Users/user/data/adult/adult.data", "r")
# initialize collection (dictionary for tabulating counts)
gender_dict = {}
# read each line, extract sex, and keep track of counts
for line in data_file:
# skip empty lines
if not line.strip():
continue
# split line into array, based on delimiter (comma and space)
line_array = line.strip().split(", ")
# tabulate the counts for gender
gender = line_array[9] # Adjusted index to 9 (Python is 0-indexed)
if gender in gender_dict:
gender_dict[gender] += 1
else:
gender_dict[gender] = 1
# close the input file
data_file.close()
# report total counts
print("Sort by key (alphabetical):")
for gender in sorted(gender_dict.keys()):
print(f" {gender} = {gender_dict[gender]}")
# report total counts by key, in reverse order
print("Sort by key (reverse alphabetical):")
for gender in sorted(gender_dict.keys(), reverse=True):
print(f" {gender} = {gender_dict[gender]}")
# report total counts by value, in reverse order (send output to file)
with open("process_file_output.txt", "w") as output_file:
print("Sort by value (reverse numerical):")
for gender, count in sorted(gender_dict.items(), key=lambda item: item[1], reverse=True):
print(f" {gender} = {count}")
output_file.write(f"{gender} = {count}\n")
Exercises
Resources
Last updated
