Description
- Download the csv file from the GitHub and place it
in a folder named data within the same folder as
your Python program. Here is a screen shot of the
folder structure to make this clearer. Also, TAs
from previous semesters told me they would CS 4395 Intro to NLP
Caution: All course work is run through plagiarism detection software comparing
students’ work as well as work from previous semesters and other sources.
prefer that your uploads to eLearning include your netid. You can name the Python file in your
GitHub any name you wish.
- The user needs to specify the relative path ‘data/data.csv’ in a sysarg. If the user does not
specify a sysarg, print an error message and end the program. Read the file, making sure your
program will work on either a Windows or Mac/Unix. See the Paths Demo in the Xtra folder of
the GitHub: https://github.com/kjmazidi/NLP
- Define a Person class with fields: last, first, mi, id, and phone. In addition to the init method,
create a display() method to output fields as shown in the sample run below.
- Create a function to process the input file. Get rid of the first line which is just the heading line.
For the remaining lines:
- split on comma to get the fields as text variables
- modify last name and first name to be in Capital Case, if necessary
- modify middle initial to be a single upper case letter, if necessary. Use ‘X’ as a middle
initial if one is missing.
- modify id if necessary, using regex. The id should be 2 letters followed by 4 digits. If an
id is not in the correct format, output an error message, and allow the user to re-enter a
valild ID. See the sample run below for data corrections.
- modify phone number, if necessary, to be in form 999-999-9999. Use regex.
- Once the data for a person is correct, create a Person object and save the object
to a dict of persons, where id is the key. Check for duplicate id and print an error
message if an ID is repeated in the input file.
- Return the dict of persons to the main function.
- In the main function, save the dictionary as a pickle file. Open the pickle file for read, and print
each person using the Person display() method to verifiy that the pickle was unpickled correctly.
There is a sample pickle notebook in the Xtras folder in the GitHub.






