Turning Government Web Content Into APIs.

Inspired by a recent Open City project that repurposes data on sewage in the Chicago River, I wanted to work through a quick example of turing a web page that houses useful information on water quality in Philly into an API.

Here is a quick screencast showing how easy it is to take information rendered as part of a standard HTML page and turn it into a useful JSON API.

It goes without saying that as more and more data producers in government learn about how data can be repurposed by others – both inside and outside of government – we’ll see more and more valuable information exposed as structured data and APIs. Viva #opendata!

Until then, there are some very powerful tools available for turing web content into powerful, reusable APIs.

Turning Government Web Content Into APIs.

5 Essential Open Data Tools

Every data wrangler has their own list of favorites – the go to tools that they use when they need to work with data.

If you need to clean, transform, or mashup data or if you are working with a data set that will form the basis for an application, here is a list of tools that can make life easier for you.

  • OpenRefine – I don’t think there is a better tool for cleaning messy data than OpenRefine. One of my favorite features is the ability to add new columns to a data set based on data in an external web service.
  • jq – I see a lot of JSON in my job, and its exceptionally easy to use JSON data with a tool like this one. For example, here is a simple jq recipe for extracting a list of licensed pawn shops in Philadelphia to a CSV file.
  • csvkit – CSV is another format I see almost everyday, and using csvkit makes it simple. My favorite utility – though I don’t use it often – is csvsql. use this handy utility to generate SQL insert statements and easily create a relational database from a CSV file.
  • Unix shell – jq and csvkit are both command line tools, and the Unix shell is the place where I spend a lot of time working with data. Without getting into a Windows vs. *nix war, there is simply no better collection of utilities for working with text files than those that can accessed via the shell. Tools like curl, grep, sed, awk, cut and a host of others are enormously useful on their own, or in combination with tools like jq and csvkit.
  • CartoDB – pretty much the easiest way to create a web-based map from an open data set. There’s even an API for building apps on top of the data you have in your CartoDB account. Enough said.

Note, my background is in software development so the list of favorites above probably reflects my own professional biases. Someone who works primarily as a data scientist might have a completely different list of favorite tools.

What’s your favorite tool for working with data?

5 Essential Open Data Tools

Onboarding Civic Hackers

Earlier this week, I had the pleasure of attending a civic hacking event jointly organized by Code for Philly and Girl Develop It Philly. The event had a tremendously good turnout – over 50 people by my count – making it one of the larger events Code for Philly has organized in recent months.

cfa_gdi

The mission of Girl Develop It is to empower women to learn software development, and as a result there were a good number of people at the event being introduced to civic hacking for the first time. This got me thinking about ways to onboard people new to civic hacking (and people new to coding) into civic technology projects.

None of these is new, but here are five ideas I came up with after the event:

Data Liberation – the foundation of civic hacking project is open data, and far too much of the data civic hackers need is locked up in broken websites and unusable formats. Helping to break some of this data free can be a tremendous benefit to open data users and civic hacking projects.

Documentation – far too many open source and civic hacking projects go without proper documentation to help other developers contribute and to support end users. Helping to create or expand documentation for a project can be critical to helping it succeed.

User Testing – Organizing and conducting end user testing for civic technology projects is sadly rare. There are some efforts underway to change this but in order for civic hacking projects to improve and succeed we need real feedback from mainstream users.

Outreach – One legitimate criticism of civic apps is that too few people know they exist. There are efforts working to change this, like Apps for Philly (still in its infancy) – a site that lists a host of different civic technology apps that are available for users. Adding new projects to this listing (and others like it) will both help these projects succeed and give the person doing it a much clearer sense of the civic technology landscape.

Helper Libraries – a great way to get comfortable writing code and to help out a civic technology project is to write helper libraries for projects with APIs. At the Apps for Philly Transit Hackathon, one project utilized recently released data from the City of Philadelphia on bicycle thefts. The lead developer created a new API for this data to enable other projects to use it. Building new client libraries in a range of different languages would be a great way to support other developers that want to incorporate bike theft data into their projects, and to get some hands on experience writing code.

There are so many ways to contribute to open source projects and to help support civic hacking efforts – these are just a few.

We need more great events like the one organized by Code for Philly and Girl Develop it Philly to bring together all of the talented people we have in our city to work on these important projects.

Onboarding Civic Hackers

The Lesson of PennApps

A couple of weeks ago, I attended the most recent PennApps hackathon – a biannual college hackathon in Philadelphia that has grown from somewhat humble beginnings a few years ago to one of the largest college hackathons in the world.

Penn Apps logo

Attendance at the event has swelled to over 1,000 participants from colleges across the country, as well as several international teams. I’ve been to PennApps 4-5 times in the past few years and it has been remarkable to see it grow. The last several I have attended, I brought with me colleagues from city government – some whom had never before been to a hackathon.

What hasn’t changed over the many installments of PennApps is the presentation of sponsor APIs at the kick off event. After a brief introduction by the organizers, the many technology companies that sponsor the event show their wares to participants. This almost always involves a short description and demo of an API or SDK that participants can use at the event to build something.

These presentations are witty, engaging and fun – they have to be. There are dozens of sponsors for the event and each is angling to encourage developers to use their tool or platform to build a working prototype by the end of the weekend. This is how companies built around APIs raise brand awareness. Increasingly, the process of building software has come to revolve around leveraging third-party platforms and APIs. This has changed that way that software development – particularly web development – happens, as well as the expectations of developers.

Venmo makes their pitch at PennApps

Most of the sponsors at an event like this one will offer some sort of additional incentive for developers that use their services – free credits, t-shirts, swag, etc. At a minimum, though, their services are easy to use and well documented. Those that are brave enough may even try a “live coding” demo – building a working application using their API or platform in just the few short minutes allotted to each sponsor presentation. When done successfully, this can help drive home the point to prospective users that a platform or API is easy to use.

Every time I attend this event with my colleagues from city government I say – “This is what governments need to do. We need to present ourselves to prospective users as if we were an API company. The same standards of quality should apply.”

It is increasingly common – and encouraging – to see governments publish developer portals. A number of different federal agencies do this, as do large cities like New York, Chicago and Philadelphia.

The truth is – if we’re being honest with ourselves – that most of them are not very good, particularly when compared with the offerings of private companies. Many don’t have common elements like an API console, code samples & tutorials, helper libraries or a discussion forum.

If we want to encourage developers to use open government data we need to be realistic with ourselves about what developers have come to expect in terms of an API offering. Events like PennApps have raised the bar for anyone that wants to encourage developers to build useful and interesting applications with their data and APIs – governments included.

We must enhance the quality of government developer portals, and we must work harder (and faster) to develop shared standards for government data and APIs. Most importantly, we have to do more to share tips, tricks and best practices between governments. There are some tools out there to get governments started down the road of building a developer center that is impactful and engaging, but we must do more.

Developer centers are not just a mandate or requirement that we need to check of our “to do” list. Government developer portals should be the hubs around which we engage and communicate with developers, technologists and the broader data community.

In a subsequent post, I’ll share a checklist that I’m putting together with a list of basic elements that every government developer portal should have.

Stay tuned!

The Lesson of PennApps

An SMS-Enabled Polling Locator

This is a great weekend for civic hacking.

Daylight Savings Time has given us an extra hour, advances in telephony application development have made it dead simple to build text messaging applications and Google has given us the Civic Information API.

With an election on Tuesday, I wanted to build a quick application that demonstrated the ease with which SMS apps can be built and the power of Google’s API.

The address of a polling place is both valuable and succinct – it’s the ideal kind of information to deliver through multiple communication channels. Text messaging (SMS) is a fairly ubiquitous communication channel, and in some cities – like Philadelphia – it’s an important way to engage with citizens that may face barriers to digital access.

The screencast above demonstrates how to use the script I developed using the Google Civic Information API and the Tropo telephony platform.

There are many ways to do this, and there are a large number of text messaging platforms and services to choose from, so if you want to use your extra hour this weekend to help people find their polling location pick the one you like best and get cracking.

It’s never been easier to build useful communication and messaging apps – in fact it’s getting easier every day. And with the richness of information available through APIs like Google’s Civic Information API, it’s never been easier to build an app that will help people get to their polling location.

Election day is just around the corner. Use your extra hour this weekend wisely…

An SMS-Enabled Polling Locator

“Phind It For Me” Live in Philly

Really excited to launch a new OpenGov project in Philadelphia – Phind It For Me.

The service is built on PHLAPI and the point data sets it houses. As such, one could understand why I’d be interested in enhancing the data sets currently in PHLAPI.

I’m really excited about this project – source code available on GitHub – and would love to see if there is an interest in launching in other cities with CouchDB-based geospatial data repositories, like Baltimore.

It’s built on the awesome new SMSified platform from Voxeo (disclaimer, I work there) and uses a Node.js module I built for working with the SMSified API.

As always, dear readers, any comments or feedback is welcomed.

Do head on over to the project website and check it out!

“Phind It For Me” Live in Philly

Experiments in Open Data: Baltimore Edition

A lot of my open gov energy of late has been focused on replicating a technique pioneered by Max Ogden (creator of PDXAPI) to convert geographic information in shapefile format into an easy to use format for developers.

Specifically, Max has pioneered a technique for converting shapefiles into documents in an instance of GeoCouch (the geographic -enabled version of CouchDB).

I was thrilled recently to come across some data for the City of Baltimore and since I know there are some open government developments in the works there, I decided to put together a quick screencast showing how open data – when provided in an easily used format – can form the basis for some pretty useful civic applications.

The screencast below walks through a quick demonstration of an application I wrote in PHP to run on the Tropo platform – it currently supports SMS, IM and Twitter use.

Just send an address in the City of Baltimore to one of the following user accounts along with a hashtag for the type of location you are looking for:

  • SMS: (410) 205-4503
  • Jabber / Gtalk: bmorelocal@tropo.im
  • Twitter: @baltimoreAPI

This demo application interacts with a GeoCouch instance I have running in Amazon EC2 – you can take a look at the data I populated it with by going to baltapi.com and accessing the standard CouchDB user interface. I haven’t really locked this instance down all that tight, but there really isn’t anything in it that I can’t replace.

Locate places in Baltimore via SMS

Besides, one of the nice things about this technique is how easy it is to convert data from shapefile format and populate a GeoCouch instance. Hopefully others with GIS datasets will look at this approach as a viable one for providing data to developers. (If anyone has some shapefiles for the City of Baltimore and you want to share them, let me know and I’ll load them into baltapi.com.

There are a number of people in Baltimore pushing for an open data program from their city government, and I have heard that there are some really cool things in the pipeline. I can’t wait to see how things develop there, and I want to do anything I can to help.

Hopefully, this simple demo will be useful in illustrating both the ease with which data can be shared with developers and the potential benefit that applications built on top of open data can hold for municipalities.

UPDATE (4/18/2011): I’ve actually replicated all of the Baltimore data from the EC2 instance discussed in this blog post to the new Iris Couch instance. Iris Couch is by far the easiest way to get started using CouchDB, and Couch’s replication feature makes it easy to move data into an Iris Couch instance.

Experiments in Open Data: Baltimore Edition

Building Multichannel Transit Apps with Tropo

This post is the third in a series about building an open source transit data application using GTFS data from the Delaware Transit Corporation.

In the first post, I described how to download the State of Delaware’s transit data and populate a MySQL database with it.

In the previous post, I walked through a process of setting up stored procedures for querying the transit data and setting up a LAMP application environment.

Now we’re ready to write code for our transit app!

Choosing a Platform

One of the most under appreciated developments that has accompanied the increasing amount of government data that has become available in open formats is the vast array of new tools now available for developers to use. I’ve talked about this a lot in the past but it bears repeating – it has never been easier to build sophisticated, multi-channel communication applications than it is now.
Tropo
The number of options open to developers is truly exciting, but there are some platforms that rise above the rest in terms of ease of use and in what they enable developers to do. For this project, I will use the Tropo WebAPI platform.

The Tropo WebAPI has a number of advantage that will come in handy for our transit app project (and any other projects you’ve got in the works). You can write a Tropo app in one of several popular scripting and web development languages – Ruby, Python, PHP, C# and JavaScript (Node.js). There are libraries available for each language that make it easy to build Tropo apps and to integrate with the Tropo API. (Disclaimer – I’ve worked on several of these libraries.)

In addition, the real magic that Tropo brings to the table is the ability to serve users on multiple communication channels (phone, IM, SMS, Twitter) from a single code base. This is especially important for an application meant to service transit riders. These users may not have the luxury of sitting in front of a desktop computer in order to look up information on a bus route or schedule. They are much more likely to be traveling and using some sort of phone or mobile device. The Tropo WebAPI is perfect for our needs.

Vivek Kundra, the former CIO of the District of Columbia and current CIO of the United States, has described the effort by governments to release data in open formats as “the democratization of data” – these efforts make previously hard to get, or hard to use data available for everyone.

I like to describe platforms like Tropo and the various libraries that are available to use with it as “the democratization of application development” – these tools make building powerful communication apps simple for anyone who understands web development.

Building our Transit App

Before we can build our application, we need to decide what it will do.

For our purposes, this has already been determined by the stored procedures we built in the last post. Our transitdata database has 2 stored procedures – one to return the nearest bus stops to a specific address or location, and one to return the next bus departure times from a specific bus stop.

However, this series of posts is meant to inspire readers to build their own applications – now that you have transit data in a powerful relational database like MySQL you can query it any way you like. In addition, the SQL scripts and steps developed for this series of posts can certainly be used with the data from any other transit agency that uses the GTFS format. There are lots. Use your imagination – build whatever you find useful.

So now that we have some idea of what we want our application to do, we need to select a development language. It will probably come as no surprise that for this example I’m going to use the PHP scripting language and the PHP Library for the Tropo WebAPI. PHP is a good match for Linux, Apache and MySQL – all technologies we used in the previous entries in this series of blog posts.

If you want some more detailed information on building PHP applications that run on the Tropo WebAPI platform, you can review a separate series of blog posts on this issue here.

To get the PHP Library for the Tropo WebAPI, you can download it and unpack on your web server, or simply clone the Github repo.

Once you do that, you can grab the code for our demo application from GitHub as well.

In order to test this application, you’ll need to sign up for a free Tropo account – you can do that here. Once you are signed up, go to the Applications section in your Tropo account and set up a new WebAPI application that points to the location of our PHP script on your web server. You can see more detailed information on setting up a Tropo account here.

web-api

Note – You’ll also need an API key from Google Maps for geocoding addresses – get one here. Change the following line in the application to include your Google API key:

define("MAPS_API_KEY", "your-api-key-goes-here");

Once your Tropo account and application are set up, you can add as many different contact methods as you like – your Tropo application is automatically provisioned a Skype number, a SIP number and an iNUM.

To illustrate how our transit app will work, I’ve gone ahead and assigned a Jabber IM name to my app – findthebus@tropo.im. Add this to your friends/user list in Google chat and you can use the app I’ve set up. Here’s what it looks like in my IM client:

transit-finder1

As you can see, my first IM to findthebus@tropo.im sends the address of a building in Downtown Wilmington (actually, a building I used to work in). The app responds with the three closest bus stops and the distance (in miles) to each.

I then send the number of the bus stop I am interested in. The app responds with the next three buses to leave that stop, the route served by each and the number of minutes before each departs.

How cools is that!

I could very easily make this application more sophisticated, so that it it delivers content tailored to specific channels (i.e., IM vs. phone) but I want to keep things simple for now.

In the next blog post of this series, we will introduce some additional tools, including Google Maps and the new hotness in cloud telephony – Phono.

Stay tuned!

Building Multichannel Transit Apps with Tropo

Democratizing Transit Data with Open Source Software

Democratizing government data will help change how government operates—and give citizens the ability to participate in making government services more effective, accessible, and transparent.

Peter Orszag, OMB Director

This post is a continuation in a series on building a transit data application using GTFS data recently released by the State of Delaware.

If you missed my first post, go back and check it out. You can get a MySQL database loaded up with all of the Delaware GTFS data in just a couple of minutes. Once you do that, you’ll be ready to follow along.
MySQL Database
Continuing our work from the last post, in this post we’ll finish building out our database and set up an environment to run a web application – for the purposes of the demo app I’m building for this series, I’ll assume you have a standard LAMP set up to work with.

Finish the Database Setup

In the last post, we downloaded the GTFS data from the State of Delaware, unzipped it and loaded it into a MySQL database. Now, we need to set up some stored procedures so that we can extract data from our MySQL database and present it to an end user.

You can see the stored procedures I created for this demo application on GitHub. To load them into our shiny new database, simply run:

  ~$ wget http://gist.github.com/raw/632306/
	    9860651ba2a61cd5af1c18529cdbab5f8c6f8e97/dartfirststate_de_us_procs.sql 
  ~$ mysql -u user_name -p transitdata < dartfirststate_de_us_procs.sql

Thats it!

If you look at these procedures, you’ll see that they are set up to answer two different questions from users. The first one – getDepartureTimesAndRoutesByStopID – will query our database and get a set of routes and departure times by the ID of a transit stop. The other – GetClosestStopsByLocation – accepts a lat/lon and returns the stop ID and name of the transit stops closest the the requesting location.

In practice, you can see these two procedures working in tandem – the later procedure would be used by someone wishing to find the transit stop closest to their present location. The former would provide information on the next buses to reach that stop, the routes they serve and the scheduled departure time from that location.

There are certainly many more potential queries that could be used to extract valuable information from the GTFS data in our database, but these two should suffice for our demo application. Also, both are pretty well suited for use from a text messaging (SMS) application, which is what we’ll build in the last post in this series.

Setting up the Application Environment

I assume for this series of posts that you have access to a LAMP server. This server should be hosted somewhere where it can receive HTTP posts from a third party platform (this is required in order to build an SMS application).

While it is not a requirement that you code your transit application in PHP, I will do so in this series. Feel free to use the development language of your choice in building your own application – just about every web development language can work with MySQL.

Before we start writing code, lets finish a few last items. First, lets create a user for our web application – remember to give this user only the privileges they need. For our demo application, the web app user only needs to EXECUTE stored procedures. So, we want to do this at the MySQL shell:

mysql> GRANT EXECUTE ON transitdata.* TO username@'localhost' IDENTIFIED BY 'password'; 
mysql> FLUSH PRIVILEGES;

Be sure to replace the ‘username’ and ‘password’ above with values of your choosing. Now, let’s put our database access credentials in a safe and convenient place.

When writing a web application, I prefer not to store this inforamtion in my code (as a config item or declared constant). Instead, I like to keep this information in my Apache configuration.

If you’re using Apache on Ubuntu, you can typically just store this inforamtion in your VirtualHost file (located in /etc/apache2/sites-available/). Use the Apache SetEnv directive to set the values you want to store:

SetEnv TRANSIT_DB_HOST localhost
SetEnv TRANSIT_DB_USER username
SetEnv TRANSIT_DB_PASS password
SetEnv TRANSIT_DB_NAME transitdata
SetEnv TRANSIT_DB_PORT 3306

Again, be sure to replace the ‘username’ and ‘password’ above with the values used when creating your MySQL user. Once you have entered these values into your VirtualHost file, save it and reload Apache:

 ~$ sudo /etc/init.d/apache2 reload

Now we’re all set to start writing code!

In the next post we’ll build a simple, yet powerful PHP-based SMS application that anyone with a cell phone can use to find a transit location nearest to them in the State of Delaware, and find out the departure times / routes from that location.

Stay tuned!

Democratizing Transit Data with Open Source Software

How to Build an Open Transit Data Application

Earlier this year, I had the chance to work with one of my state’s Senators to draft and pass a bill requiring the state’s transit agency to publish all of it’s route, schedule and fare information in an open format for use by third parties.

This bill was signed into law by the Governor a few months ago, and the data is now available (in GTFS format) on the Delaware Transit Agency’s web site.

My primary goal in working to get this law enacted was to raise awareness within my state about the potential for open government data to spur civic coding and the development of useful applications at little or no cost to the government. Now that my state actually publishes some open data (Hells to the yeah!), I think the next step for me is to provide some guidance on how to get started using it to build civic applications.

Hopefully, this will show others how easy it is and get them to try their hand at building a civic application.

(Note, transit data is an especially rich source for developing civic applications. For some background and more detail on this, see this post.)

In the next several posts, I’ll document one process for developing an open source transit data application using GTFS data from the Delaware Transit Agency. I’ll be sharing code and some examples that will help you get started if you feel like trying your hand at building a civic application.

Let’s get started!

Getting the Data

Now that the Delaware Transit Agency has published all of their route and schedule information, anyone that wants to use it can simply download it.

This zip file contains a collection of text files that conform to the GTFS specification – for a detailed description of file contents, go here. If you want to build a transit app with GTFS data, I recommend spending a little time becoming familiar with the layout of these files, and getting a sense of what the data represents.

Setting up a Database

In order to use this data as part of an application, we’re probably going to need to get it into a database so that we can manipulate it and run queries against it. An easy way to do this is to import it into a MySQL database instance.

MySQL is a powerful open source database that is used in scores of different web applications and its a solid choice for building a transit data application. In addition, the MySQL LOAD DATA INFILE statement is a powerful and easy way to populate a database with information from a text file (or multiple files).

I’ve created a SQL script to load Delaware transit data into a MySQL database. You can get this script from GitHub – it’s pretty simple, and you should feel free to modify it as your own personal preferences or requirements dictate. Just fork the Gist.

Combining this script with a couple of minutes on the command line will give you a MySQL database with all of the transit data loaded up and ready to use. The steps below assume that you have MySQL installed and running.

To install MySQL:
~$ sudo apt-get install mysql-server

To see if MySQL is running:
~$ pgrep mysql

Create a temporary location for the GTFS files:
~$ mkdir /tmp/dartfirst_de_us

Download the GTFS files from the Delaware Transit Agency website:
~$ wget http://www.dartfirststate.com/information/routes/
gtfs_data/dartfirststate_de_us.zip

Unzip the individual text files to our temporary location:
~$ unzip dartfirststate_de_us.zip -d /tmp/dartfirst_de_us/

Get the SQL script for loading GTFS files into MySQL from GitHub:
~$ wget http://gist.github.com/raw/
615470/7f62e8354d680011f7eea5f9afcfd0ae93a6fedb/dartfirststate_de_us.sql

Invoke MySQL and pass in the SQL script (make sure you change ‘user_name’ to a valid MySQL user name):
~$ mysql -u user_name -p < dartfirststate_de_us.sql

That’s it!

Now, all of the data from the original text files has been loaded into a MySQL database called transitdata. You can start to construct queries to retrieve information from these tables to support the functionality for your application.

In the next post, I’ll walk through a few basic queries that can extract useful information from these tables. We’ll also lay the groundwork for a really cool mobile application that I will deploy for use by the public when this series of posts is complete.

Stay tuned!

How to Build an Open Transit Data Application