Democratizing Transit Data with Open Source Software

Democratizing government data will help change how government operates—and give citizens the ability to participate in making government services more effective, accessible, and transparent.

Peter Orszag, OMB Director

This post is a continuation in a series on building a transit data application using GTFS data recently released by the State of Delaware.

If you missed my first post, go back and check it out. You can get a MySQL database loaded up with all of the Delaware GTFS data in just a couple of minutes. Once you do that, you’ll be ready to follow along.
MySQL Database
Continuing our work from the last post, in this post we’ll finish building out our database and set up an environment to run a web application – for the purposes of the demo app I’m building for this series, I’ll assume you have a standard LAMP set up to work with.

Finish the Database Setup

In the last post, we downloaded the GTFS data from the State of Delaware, unzipped it and loaded it into a MySQL database. Now, we need to set up some stored procedures so that we can extract data from our MySQL database and present it to an end user.

You can see the stored procedures I created for this demo application on GitHub. To load them into our shiny new database, simply run:

  ~$ wget
  ~$ mysql -u user_name -p transitdata < dartfirststate_de_us_procs.sql

Thats it!

If you look at these procedures, you’ll see that they are set up to answer two different questions from users. The first one – getDepartureTimesAndRoutesByStopID – will query our database and get a set of routes and departure times by the ID of a transit stop. The other – GetClosestStopsByLocation – accepts a lat/lon and returns the stop ID and name of the transit stops closest the the requesting location.

In practice, you can see these two procedures working in tandem – the later procedure would be used by someone wishing to find the transit stop closest to their present location. The former would provide information on the next buses to reach that stop, the routes they serve and the scheduled departure time from that location.

There are certainly many more potential queries that could be used to extract valuable information from the GTFS data in our database, but these two should suffice for our demo application. Also, both are pretty well suited for use from a text messaging (SMS) application, which is what we’ll build in the last post in this series.

Setting up the Application Environment

I assume for this series of posts that you have access to a LAMP server. This server should be hosted somewhere where it can receive HTTP posts from a third party platform (this is required in order to build an SMS application).

While it is not a requirement that you code your transit application in PHP, I will do so in this series. Feel free to use the development language of your choice in building your own application – just about every web development language can work with MySQL.

Before we start writing code, lets finish a few last items. First, lets create a user for our web application – remember to give this user only the privileges they need. For our demo application, the web app user only needs to EXECUTE stored procedures. So, we want to do this at the MySQL shell:

mysql> GRANT EXECUTE ON transitdata.* TO username@'localhost' IDENTIFIED BY 'password'; 

Be sure to replace the ‘username’ and ‘password’ above with values of your choosing. Now, let’s put our database access credentials in a safe and convenient place.

When writing a web application, I prefer not to store this inforamtion in my code (as a config item or declared constant). Instead, I like to keep this information in my Apache configuration.

If you’re using Apache on Ubuntu, you can typically just store this inforamtion in your VirtualHost file (located in /etc/apache2/sites-available/). Use the Apache SetEnv directive to set the values you want to store:

SetEnv TRANSIT_DB_HOST localhost
SetEnv TRANSIT_DB_USER username
SetEnv TRANSIT_DB_PASS password
SetEnv TRANSIT_DB_NAME transitdata

Again, be sure to replace the ‘username’ and ‘password’ above with the values used when creating your MySQL user. Once you have entered these values into your VirtualHost file, save it and reload Apache:

 ~$ sudo /etc/init.d/apache2 reload

Now we’re all set to start writing code!

In the next post we’ll build a simple, yet powerful PHP-based SMS application that anyone with a cell phone can use to find a transit location nearest to them in the State of Delaware, and find out the departure times / routes from that location.

Stay tuned!

Developer Survey on Open Government Data

In the last post, I started a series on building open government applications with transit data. If you read that post, and followed along, you’ve now got all of the GTFS data released by the Delaware Transit Corporation imported into a MySQL database.
If you’re someone who is interested in this series, and cares about open government data, then you should take a quick detour over to the Socrata website to complete a short survey on open government data.

This survey is part of a larger study being undertaken by several organizations prominent in the open government data movement. Various components of the study are focused on the importance of open government data from the perspective of citizens, government employees and civic application developers.

If you like building civic applications with open government data, take a few minutes and complete this survey.

We’ll get back to building the new hotness in transit apps shortly.

How to Build an Open Transit Data Application

Earlier this year, I had the chance to work with one of my state’s Senators to draft and pass a bill requiring the state’s transit agency to publish all of it’s route, schedule and fare information in an open format for use by third parties.

This bill was signed into law by the Governor a few months ago, and the data is now available (in GTFS format) on the Delaware Transit Agency’s web site.

My primary goal in working to get this law enacted was to raise awareness within my state about the potential for open government data to spur civic coding and the development of useful applications at little or no cost to the government. Now that my state actually publishes some open data (Hells to the yeah!), I think the next step for me is to provide some guidance on how to get started using it to build civic applications.

Hopefully, this will show others how easy it is and get them to try their hand at building a civic application.

(Note, transit data is an especially rich source for developing civic applications. For some background and more detail on this, see this post.)

In the next several posts, I’ll document one process for developing an open source transit data application using GTFS data from the Delaware Transit Agency. I’ll be sharing code and some examples that will help you get started if you feel like trying your hand at building a civic application.

Let’s get started!

Getting the Data

Now that the Delaware Transit Agency has published all of their route and schedule information, anyone that wants to use it can simply download it.

This zip file contains a collection of text files that conform to the GTFS specification – for a detailed description of file contents, go here. If you want to build a transit app with GTFS data, I recommend spending a little time becoming familiar with the layout of these files, and getting a sense of what the data represents.

Setting up a Database

In order to use this data as part of an application, we’re probably going to need to get it into a database so that we can manipulate it and run queries against it. An easy way to do this is to import it into a MySQL database instance.

MySQL is a powerful open source database that is used in scores of different web applications and its a solid choice for building a transit data application. In addition, the MySQL LOAD DATA INFILE statement is a powerful and easy way to populate a database with information from a text file (or multiple files).

I’ve created a SQL script to load Delaware transit data into a MySQL database. You can get this script from GitHub – it’s pretty simple, and you should feel free to modify it as your own personal preferences or requirements dictate. Just fork the Gist.

Combining this script with a couple of minutes on the command line will give you a MySQL database with all of the transit data loaded up and ready to use. The steps below assume that you have MySQL installed and running.

To install MySQL:
~$ sudo apt-get install mysql-server

To see if MySQL is running:
~$ pgrep mysql

Create a temporary location for the GTFS files:
~$ mkdir /tmp/dartfirst_de_us

Download the GTFS files from the Delaware Transit Agency website:
~$ wget

Unzip the individual text files to our temporary location:
~$ unzip -d /tmp/dartfirst_de_us/

Get the SQL script for loading GTFS files into MySQL from GitHub:
~$ wget

Invoke MySQL and pass in the SQL script (make sure you change ‘user_name’ to a valid MySQL user name):
~$ mysql -u user_name -p < dartfirststate_de_us.sql

That’s it!

Now, all of the data from the original text files has been loaded into a MySQL database called transitdata. You can start to construct queries to retrieve information from these tables to support the functionality for your application.

In the next post, I’ll walk through a few basic queries that can extract useful information from these tables. We’ll also lay the groundwork for a really cool mobile application that I will deploy for use by the public when this series of posts is complete.

Stay tuned!