Tuesday, February 2, 2016

w4d2 - API, SQL and python

I love how we're integrating different elements now.  The project today was to use python and the Open Weather api to get a 16-day forecast for a city, to take the resulting json-formatted data and push it to an sqlite database we initialized.We're still at the super basic level and dealing with canned scenarios, but I can see the pieces interacting.

It's also really fun (is that word I want here?  I'm sticking with it) to be able to have python interact with the world beyond my computer.  Almost everything I've done so far has run off text files or output to the terminal.  Now I can request information directly from the web, and push to a local database.

The 'extension' project was to build a python script that would ask a user for city and use their answer to query Open Weather for a forecast, then put that into the sql database.  Not too challenging by itself.  A step deeper, though, and I was looking through the documentation for how queries in the Open Weather api are structured.  It's a whole mini-language hiding in almost plain site in the url!

I'd always known the extra long URLs that look like random gibberish must hold some meaning, but I'm getting an idea of just how much information is packed into URLs.  Yesterday's toy server essentially replicated a directory tree in how web pages were stored and accessed.  Every word or character in the url had a counterpart in the directory tree the information was pulled from.

As part of my own investigations, I found a 20 MB file correlating the numeric code Open Weather uses to represent each weather station in their network.  It's 209,000 lines of text!  Small by many measures, but huge for me.  Opening it and using it required finding the json module for python, then porting the info to my sqlite database.  Not too difficult, once I found all the pieces I needed.

The harder realization was that there are multiple entries for cities.  Some entries refer to different weather stations in the same city, while others refer to cities of the same name in disparate places. (Paris, Texas!).  There are country codes in the data set to narrow the options some, but the codes are simple two-letter abbreviations.

Next step: a lookup table for code/country name.  Getting the code into python was pretty easy, but getting the data into SQL was impossible due to utf-8 conflicts with module (I think).  StackOverflow searches took me to some info on converting utf-8 to ASCII.  I can make it work, but information is lost as non-ASCII characters in country names are stripped. Eventually, I decided to just let it rest for now.  I can add unicode to the long list of things to investigate further.

I did manage to get the project working, and now it's once again 1130 PM and I'm almost cross-eyed with exhaustion.

1 comment: