First - I was able to wrap up the week two project: a text file search script that assembles a list of text files in a given arbitrary directory tree using a recursive search, then indexes the words in the files such that a user can discover which files contain the word of interest.
I managed to complete my own objectives, and added in a list of words to ignore (e.g. 'the', 'a', pronouns, etc.). Courtesy of stackoverflow, I was able to add in a function to open files using the default program for each file. Since I'm working with only .txt and .csv, that's not a huge range of options. The script looks for os before opening, since mac and windows use different commands. Net result: I learned a bit about subprocess module, and more about the os module.
I also was able to re-use my entry validation module to ensure users were selecting valid options.
My stretch goal: index large files (e.g. whole books as text files, courtesy of the Gutenberg Project. My goal is to collect the line number where a word appears in a file to make words reasonably findable. My script should also have a function to provide a preview of the context of a word.
No comments:
Post a Comment