The 'capstone' project for week two is to create a miniature version of an indexed search engine for text files. The basic version should:
1. scan through a file structure and make a list of files and associated path names.
2. read all the text files to create an array of words
3. create a dictionary of {'word1': {file1: path, file2: path}, 'word2: {}}
The list of files is produced using the recursive tool built earlier in the week.
At this point, I have a working version of the project that accepts only files with '.txt' in the name.
To do, in rough order of importance:
1. exception handling for words not in the dictionary that a user tries to look up.
2. remove punctuation and formatting from text before reading for the library.
3. validation on user input (did they type a word, or put in random characters)
4. ask user if they wish to open one of the files in a text editor
5. open and index other file types (e.g. csv)
All of that and general edits for readability and cleanliness.
I might be further along by now, but I took some time this morning (it was an unstructured work day) to refactor my budget app file-open function. After the refactor, it now appends the data from an opened csv to the in-script dictionary (and gets the keys right). Opening a csv file creates a dictionary with key names = column headers in the csv.
Also, I built a script to make temporary files in my practice file tree. Each file is filled with a random number of words organized into lines. Since the function uses the recursive file tree generated above, pseudo-text-files are scattered thoughout the sample directory tree.
And now it's after midnight on a day that started with the dawn. More later.
No comments:
Post a Comment