Details About IllinoisTrack.us

IllinoisTrack.us is a project/hobby that I recently embarked for the purpose of creating an "open government" website for my home state of Illinois. I am far from a political expert, but the goal of IllinoisTrack is precisely to help people like me - politically interested non-experts - understand what our representatives down in Springfield are up to.

In the following few paragraphs, I'd like to provide a description of how I made this site and what data I used to do it. I think the most interesting thing I can contribute is to relate my experience of working with the data sources that the State of Illinois provides - their good points, their bad points, their funny quirks, and the handful of (in my opinion) poor policy choices contained within them. I think this information could be helpful both to people like me who would like to create sites like IllinoisTrack in other states and also, perhaps, to the designers of government websites so that can know what people like me would like to see in government data sources.

First, a few words on coding IllinoisTrack: The website as it is now is the product of about three months of on-and-off work. All of the code was entirely written from scratch by me, except for the MVC (model-view-controller) framework, which is from the open-source project CakePHP. I originally considered starting from GovTrack's source or from that of OpenCongress, but neither is written in the web scripting language I am most familiar with (PHP) and I thought there would be too many unique things about the Illinois legislature and the kind of features I wanted to provide to justify starting from another project's code (I don't know whether this was a good assumption or not, but it's what I chose).

I think the two best decisions I made with regards to the code were using CakePHP for all the website code and separating my website code from my data harvesting code. CakePHP (which I'm not trying to push, by the way, I think any MVC framework would have worked fine) made writing the website exceptionally simple, and I spent very little time on it. Much more time was spent on the data harvesting code, which consists of a separate set of PHP scripts that gather data from the Illinois websites, parse it using regular expressions, and save the data to my database. These scripts are called by the cron utility on my webserver and harvest at a rate of about 50 bills per minute (this moderate rate is on purpose because I didn't want to hit the Illinois Legislature's webservers too hard).

Now, on to the data: All of the data I currently harvest is from http://www.ilga.gov, which is the website of the Illinois General Assembly. I want to start off by saying I am grateful to the State Legislature and to the designers of ilga.gov, because without them I could not have made IllinoisTrack. The ilga.gov site is well organized, comprehensive, and up-to-date. It even provides some data in XML, which is great. Nevertheless, I have some criticisms of this site, which I hope its designers will take as constructive.

The first thing I noticed on ilga.gov was that it has a link to a FTP site, which contains XML files that have the status and meta data about all the bills. I thought this would make the job of harvesting trivial, but unfortunately I soon discovered that this XML data would not be very useful for me (ultimately I did not use it at all). The first problem is that the XML data is not updated as rapidly as the website. While the website is updated frequently during the day, the XML data is only updated once per day (at midnight). This is not very helpful for bill status information, which changes constantly. Also, the XML data is not complete. It does not provide any member information (the member name names, districts, etc.), committee information (committee memberships, bills in committees), or roll-call votes. The XML only has bill status information, and even this is not complete in the sense that it is missing certain information found on the website (for example, on the website, certain bill actions are bolded to indicate they are "major actions", but this fact is not indicated in the XML). Finally, the XML has a few quirks in it. For example, the bill sponsors are all put into one XML tag and are separated variously by dashes, commas, and the word 'and'. This is fine, not hard to parse, but a little annoying.

After I gave up on the XML, I decided to just parse the website directly using regular expressions. This was not terribly difficult since, as I said before, the website is well organized. Of course, any page scrapping like this is inherently fragile since even tiny changes can break my code. Also, there are a few problems with how the data is presented. The bill actions are a little confusing because their format is not consistent. Some action descriptions have only one link, some more, some none, and you have to view quite a few before you can see the pattern. It would be nice if the description were broken up into something like "action type", "related committee", "related member", "related roll-call vote", etc. Also, it took me a while to understand the formatting of the sponsorship information (that I described above). I eventually found a PDF document that was meant for incoming members to read, which explains that the first member listed is the Chief Sponsor, members separated by dashes are Chief Co-Sponsors, and all other members are regular Co-Sponsors. There's no reason why this information couldn't have been presented in this way on the ilga website (as it is on my website).

The most frustrating aspect of all has to be the roll-call data. Currently, the roll-call votes are all presented exclusively as PDF documents. I really can't understand any justification for this, and its main effect is to make it significantly more difficult for me to parse. I emailed the webmaster at ilga.gov to ask for an explanation, but did not receive a satisfactory reply. Right now, the roll-call vote data is the only major data that I do not have on IllinoisTrack, but I hope to get it soon.

To conclude, I just want to say a few words about my future plans for IllinoisTrack. In the immediate future, I want to finish collecting all the data I am still missing. The first priority is to get the roll-call data out of their PDF prison. After that, I'll get some amendment data and other less vital information, and finally I'll collect the data from past years. In the long run, I'm trying to think of ways to further the main mission of my site: to help people like me better understand what's going on in the state legislature. There are literally thousands of bills that are written per term, and it's hard for a layman to understand which are important and which are not. Statistical analyses and ranking can help a little, but ultimately what I'd like to know is what are the major *policy* debates that are going on in the legislature, and which bills relate to these. This is something that is hard to get from the data alone, which is why I started a "featured legislator" section. I'm trying to use this to get various legislators to fill out a short questionnaire telling me what bills they think are important and why. I don't know whether this will be successful or not, but time will tell. Beyond that, I'd like to try to incorporate news stories from Google news to see if I can match them to the specific bills they refer to.