Life Between the Lines: Week 2 is a wrap

The 'capstone' project for week two is to create a miniature version of an indexed search engine for text files. The basic version should:

1. scan through a file structure and make a list of files and associated path names.
2. read all the text files to create an array of words
3. create a dictionary of {'word1': {file1: path, file2: path}, 'word2: {}}

The list of files is produced using the recursive tool built earlier in the week.

At this point, I have a working version of the project that accepts only files with '.txt' in the name.

To do, in rough order of importance:

1. exception handling for words not in the dictionary that a user tries to look up.
2. remove punctuation and formatting from text before reading for the library.
3. validation on user input (did they type a word, or put in random characters)
4. ask user if they wish to open one of the files in a text editor
5. open and index other file types (e.g. csv)

All of that and general edits for readability and cleanliness.

I might be further along by now, but I took some time this morning (it was an unstructured work day) to refactor my budget app file-open function. After the refactor, it now appends the data from an opened csv to the in-script dictionary (and gets the keys right). Opening a csv file creates a dictionary with key names = column headers in the csv.

Also, I built a script to make temporary files in my practice file tree. Each file is filled with a random number of words organized into lines. Since the function uses the recursive file tree generated above, pseudo-text-files are scattered thoughout the sample directory tree.

And now it's after midnight on a day that started with the dawn. More later.

Life Between the Lines

Friday, January 22, 2016

Week 2 is a wrap - a mini google

No comments:

Post a Comment