Monday, January 25, 2016

Week3, day 1 - recursive file indexer wrap up

First - I was able to wrap up the week two project:  a text file search script that assembles a list of text files in a given arbitrary directory tree using a recursive search, then indexes the words in the files such that a user can discover which files contain the word of interest.

I managed to complete my own objectives, and added in a list of words to ignore (e.g. 'the', 'a', pronouns, etc.).  Courtesy of stackoverflow, I was able to add in a function to open files using the default program for each file.  Since I'm working with only .txt and .csv, that's not a huge range of options.  The script looks for os before opening, since mac and windows use different commands.  Net result: I learned a bit about subprocess module, and more about the os module.

I also was able to re-use my entry validation module to ensure users were selecting valid options.

My stretch goal: index large files (e.g. whole books as text files, courtesy of the Gutenberg Project.  My goal is to collect the line number where a word appears in a file to make words reasonably findable.  My script should also have a function to provide a preview of the context of a word.


No comments:

Post a Comment