Friday, January 22, 2016

Week 2 is a wrap - a mini google

The 'capstone' project for week two is to create a miniature version of an indexed search engine for text files.  The basic version should:

1.  scan through a file structure and make a list of files and associated path names.
2.  read all the text files to create an array of words
3. create a dictionary of {'word1': {file1: path, file2: path}, 'word2: {}}

The list of files is produced using the recursive tool built earlier in the week.

At this point, I have a working version of the project that accepts only files with '.txt' in the name.

To do, in rough order of importance:

1. exception handling for words not in the dictionary that a user tries to look up.
2. remove punctuation and formatting from text before reading for the library.
3. validation on user input (did they type a word, or put in random characters)
4. ask user if they wish to open one of the files in  a text editor
5.  open and index other file types (e.g. csv)

All of that and general edits for readability and cleanliness.

I might be further along by now, but I took some time this morning (it was an unstructured work day) to refactor my budget app file-open function.  After the refactor, it now appends the data from an opened csv to the in-script dictionary (and gets the keys right).  Opening a csv file creates a dictionary with key names = column headers in the csv.

Also, I built a script to make temporary files in my practice file tree.  Each file is filled with a random number of words organized into lines.  Since the function uses the recursive file tree generated above, pseudo-text-files are scattered thoughout the sample directory tree.

And now it's after midnight on a day that started with the dawn.  More later.


No comments:

Post a Comment