# File Input/Output

Many Python programs involve the input and output of files. When analyzing a dataset, that dataset file will need to be pulled into your program (input). If you want to see the results of your analysis, your program will need an output.&#x20;

This section provides the syntax for inputting files (reading) and outputting results (writing) using base Python (i.e, no packages such as Pandas)

### UC Irvine Machine Learning Repository: Adult Data Set

* Tabulate and report counts for sex in [Adult Data Set](https://archive.ics.uci.edu/dataset/2/adult) from the [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/).

***Dataset*** (example lines from `adult.data`)&#x20;

```python
39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K
38, Private, 215646, HS-grad, 9, Divorced, Handlers-cleaners, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K
53, Private, 234721, 11th, 7, Married-civ-spouse, Handlers-cleaners, Husband, Black, Male, 0, 0, 40, United-States, <=50K
28, Private, 338409, Bachelors, 13, Married-civ-spouse, Prof-specialty, Wife, Black, Female, 0, 0, 40, Cuba, <=50K
```

***Input*** (`process_file.py`)

```python
# process_file.py
# Tabulate and report counts for sex in Adult Data Set
# https://archive.ics.uci.edu/ml/datasets/adult

# relative path of file
data_file = open("_data/adult/adult.data", "r")

# absolute path of file
# data_file = open("/Users/user/data/adult/adult.data", "r")

# initialize collection (dictionary for tabulating counts)
gender_dict = {}

# read each line, extract sex, and keep track of counts
for line in data_file:

    # skip empty lines
    if not line.strip():
        continue

    # split line into array, based on delimiter (comma and space)
    line_array = line.strip().split(", ")

    # tabulate the counts for gender
    gender = line_array[9]  # Adjusted index to 9 (Python is 0-indexed)
    if gender in gender_dict:
        gender_dict[gender] += 1
    else:
        gender_dict[gender] = 1

# close the input file
data_file.close()

# report total counts
print("Sort by key (alphabetical):")
for gender in sorted(gender_dict.keys()):
    print(f"  {gender} = {gender_dict[gender]}")

# report total counts by key, in reverse order
print("Sort by key (reverse alphabetical):")
for gender in sorted(gender_dict.keys(), reverse=True):
    print(f"  {gender} = {gender_dict[gender]}")

# report total counts by value, in reverse order (send output to file)
with open("process_file_output.txt", "w") as output_file:
    print("Sort by value (reverse numerical):")
    for gender, count in sorted(gender_dict.items(), key=lambda item: item[1], reverse=True):
        print(f"  {gender} = {count}")
        output_file.write(f"{gender} = {count}\n")

```

***Output***

```
Sort by key (alphabetical):
  Female = 10771
  Male = 21790
Sort by key (reverse alphabetical):
  Male = 21790
  Female = 10771
Sort by value (reverse numerical):
  Male = 21790
  Female = 10771

```

***Terminal***

```bash
$ python process_file.py
Sort by key (alphabetical):
  Female = 10771
  Male = 21790
Sort by key (reverse alphabetical):
  Male = 21790
  Female = 10771
Sort by value (reverse numerical):
  Male = 21790
  Female = 10771

$ ls -1
process_file.py
process_file_output.txt

$ cat process_file_output.txt
Male = 21790
Female = 10771
```

## Exercises <a href="#documentation" id="documentation"></a>

* Analyze the MIMIC-IV Demo Files Using Julia - *<mark style="color:yellow;">Forthcoming!</mark>*
* Analyze the SyntheticRI Demo Files Using Julia - *<mark style="color:yellow;">Forthcoming</mark>*

## Resources

* Tutorials Point: [Python - Files I/O](https://www.tutorialspoint.com/python/python_files_io.htm)
* Data Science Central: [Python File Input/Output](https://www.datasciencecentral.com/python-file-input-output-read-write-files-in-python/)
