Many Python programs involve the input and output of files. When analyzing a dataset, that dataset file will need to be pulled into your program (input). If you want to see the results of your analysis, your program will need an output.
This section provides the syntax for inputting files (reading) and outputting results (writing) using base Python (i.e, no packages such as Pandas)
UC Irvine Machine Learning Repository: Adult Data Set
# process_file.py# Tabulate and report counts for sex in Adult Data Set# https://archive.ics.uci.edu/ml/datasets/adult# relative path of filedata_file =open("_data/adult/adult.data", "r")# absolute path of file# data_file = open("/Users/user/data/adult/adult.data", "r")# initialize collection (dictionary for tabulating counts)gender_dict ={}# read each line, extract sex, and keep track of countsfor line in data_file:# skip empty linesifnot line.strip():continue# split line into array, based on delimiter (comma and space) line_array = line.strip().split(", ")# tabulate the counts for gender gender = line_array[9]# Adjusted index to 9 (Python is 0-indexed)if gender in gender_dict: gender_dict[gender]+=1else: gender_dict[gender]=1# close the input filedata_file.close()# report total countsprint("Sort by key (alphabetical):")for gender insorted(gender_dict.keys()):print(f" {gender} = {gender_dict[gender]}")# report total counts by key, in reverse orderprint("Sort by key (reverse alphabetical):")for gender insorted(gender_dict.keys(), reverse=True):print(f" {gender} = {gender_dict[gender]}")# report total counts by value, in reverse order (send output to file)withopen("process_file_output.txt", "w")as output_file:print("Sort by value (reverse numerical):")for gender, count insorted(gender_dict.items(), key=lambdaitem: item[1], reverse=True):print(f" {gender} = {count}") output_file.write(f"{gender} = {count}\n")
Output
Sort by key (alphabetical):
Female = 10771
Male = 21790
Sort by key (reverse alphabetical):
Male = 21790
Female = 10771
Sort by value (reverse numerical):
Male = 21790
Female = 10771