Building the Government Data Toolkit


Flickr image courtesy of Flickr user bitterbuick

We live in a time when people outside of government have better tools to build things with and extract insights from government data than governments themselves.

These tools are more plentiful, more powerful, more flexible, and less expensive than pretty much everything government employees currently have at their disposal. Governments may have exiting relationships with huge tech companies like Microsoft, IBM, Esri and others that have an array of different data tools — it doesn’t really matter.

In the race for better data tools, the general public isn‘t just beating out the public sector, its already won the race and is taking a Jenner-esque victory lap.

This isn’t a new trend.

Read More

Thinking Small on Civic Tech

Designing simple systems is one of the great challenges of Government 2.0. It means the end of grand, feature-filled programs, and their replacement by minimal services extensible by others.

— Tim O’Reilly, Open Government

The original idea of Government as a Platform is now almost a decade old. In the world of technology, that’s a long time.

In that time, people working inside and outside of government to implement this idea have learned a lot about what works well, and what does not. In addition, we’ve seen some significant changes in the world of technology over the past decade or so, and the way the we develop solutions (both in the world of civic tech, and outside of it) have changed fairly dramatically.

The power of the original idea for Government as a Platform continues to echo in the world of civic tech and open data. I have no doubt that it will for a long time to come.

But in 2015 what does Government as a Platform actually look like, and what should it look like going forward into the future? What are its component parts? How does it manifest in terms of actual infrastructure, both inside and outside government?

And, most importantly, who controls this infrastructure and has a say in how it is shaped and used.

Read More

Turning Government Web Content Into APIs.

Inspired by a recent Open City project that repurposes data on sewage in the Chicago River, I wanted to work through a quick example of turing a web page that houses useful information on water quality in Philly into an API.

Here is a quick screencast showing how easy it is to take information rendered as part of a standard HTML page and turn it into a useful JSON API.

It goes without saying that as more and more data producers in government learn about how data can be repurposed by others – both inside and outside of government – we’ll see more and more valuable information exposed as structured data and APIs. Viva #opendata!

Until then, there are some very powerful tools available for turing web content into powerful, reusable APIs.

5 Essential Open Data Tools

Every data wrangler has their own list of favorites – the go to tools that they use when they need to work with data.

If you need to clean, transform, or mashup data or if you are working with a data set that will form the basis for an application, here is a list of tools that can make life easier for you.

  • OpenRefine – I don’t think there is a better tool for cleaning messy data than OpenRefine. One of my favorite features is the ability to add new columns to a data set based on data in an external web service.
  • jq – I see a lot of JSON in my job, and its exceptionally easy to use JSON data with a tool like this one. For example, here is a simple jq recipe for extracting a list of licensed pawn shops in Philadelphia to a CSV file.
  • csvkit – CSV is another format I see almost everyday, and using csvkit makes it simple. My favorite utility – though I don’t use it often – is csvsql. use this handy utility to generate SQL insert statements and easily create a relational database from a CSV file.
  • Unix shell – jq and csvkit are both command line tools, and the Unix shell is the place where I spend a lot of time working with data. Without getting into a Windows vs. *nix war, there is simply no better collection of utilities for working with text files than those that can accessed via the shell. Tools like curl, grep, sed, awk, cut and a host of others are enormously useful on their own, or in combination with tools like jq and csvkit.
  • CartoDB – pretty much the easiest way to create a web-based map from an open data set. There’s even an API for building apps on top of the data you have in your CartoDB account. Enough said.

Note, my background is in software development so the list of favorites above probably reflects my own professional biases. Someone who works primarily as a data scientist might have a completely different list of favorite tools.

What’s your favorite tool for working with data?

Onboarding Civic Hackers

Earlier this week, I had the pleasure of attending a civic hacking event jointly organized by Code for Philly and Girl Develop It Philly. The event had a tremendously good turnout – over 50 people by my count – making it one of the larger events Code for Philly has organized in recent months.


The mission of Girl Develop It is to empower women to learn software development, and as a result there were a good number of people at the event being introduced to civic hacking for the first time. This got me thinking about ways to onboard people new to civic hacking (and people new to coding) into civic technology projects.

None of these is new, but here are five ideas I came up with after the event:

Data Liberation – the foundation of civic hacking project is open data, and far too much of the data civic hackers need is locked up in broken websites and unusable formats. Helping to break some of this data free can be a tremendous benefit to open data users and civic hacking projects.

Documentation – far too many open source and civic hacking projects go without proper documentation to help other developers contribute and to support end users. Helping to create or expand documentation for a project can be critical to helping it succeed.

User Testing – Organizing and conducting end user testing for civic technology projects is sadly rare. There are some efforts underway to change this but in order for civic hacking projects to improve and succeed we need real feedback from mainstream users.

Outreach – One legitimate criticism of civic apps is that too few people know they exist. There are efforts working to change this, like Apps for Philly (still in its infancy) – a site that lists a host of different civic technology apps that are available for users. Adding new projects to this listing (and others like it) will both help these projects succeed and give the person doing it a much clearer sense of the civic technology landscape.

Helper Libraries – a great way to get comfortable writing code and to help out a civic technology project is to write helper libraries for projects with APIs. At the Apps for Philly Transit Hackathon, one project utilized recently released data from the City of Philadelphia on bicycle thefts. The lead developer created a new API for this data to enable other projects to use it. Building new client libraries in a range of different languages would be a great way to support other developers that want to incorporate bike theft data into their projects, and to get some hands on experience writing code.

There are so many ways to contribute to open source projects and to help support civic hacking efforts – these are just a few.

We need more great events like the one organized by Code for Philly and Girl Develop it Philly to bring together all of the talented people we have in our city to work on these important projects.

The Lesson of PennApps

A couple of weeks ago, I attended the most recent PennApps hackathon – a biannual college hackathon in Philadelphia that has grown from somewhat humble beginnings a few years ago to one of the largest college hackathons in the world.

Penn Apps logo

Attendance at the event has swelled to over 1,000 participants from colleges across the country, as well as several international teams. I’ve been to PennApps 4-5 times in the past few years and it has been remarkable to see it grow. The last several I have attended, I brought with me colleagues from city government – some whom had never before been to a hackathon.

What hasn’t changed over the many installments of PennApps is the presentation of sponsor APIs at the kick off event. After a brief introduction by the organizers, the many technology companies that sponsor the event show their wares to participants. This almost always involves a short description and demo of an API or SDK that participants can use at the event to build something.

These presentations are witty, engaging and fun – they have to be. There are dozens of sponsors for the event and each is angling to encourage developers to use their tool or platform to build a working prototype by the end of the weekend. This is how companies built around APIs raise brand awareness. Increasingly, the process of building software has come to revolve around leveraging third-party platforms and APIs. This has changed that way that software development – particularly web development – happens, as well as the expectations of developers.

Venmo makes their pitch at PennApps

Most of the sponsors at an event like this one will offer some sort of additional incentive for developers that use their services – free credits, t-shirts, swag, etc. At a minimum, though, their services are easy to use and well documented. Those that are brave enough may even try a “live coding” demo – building a working application using their API or platform in just the few short minutes allotted to each sponsor presentation. When done successfully, this can help drive home the point to prospective users that a platform or API is easy to use.

Every time I attend this event with my colleagues from city government I say – “This is what governments need to do. We need to present ourselves to prospective users as if we were an API company. The same standards of quality should apply.”

It is increasingly common – and encouraging – to see governments publish developer portals. A number of different federal agencies do this, as do large cities like New York, Chicago and Philadelphia.

The truth is – if we’re being honest with ourselves – that most of them are not very good, particularly when compared with the offerings of private companies. Many don’t have common elements like an API console, code samples & tutorials, helper libraries or a discussion forum.

If we want to encourage developers to use open government data we need to be realistic with ourselves about what developers have come to expect in terms of an API offering. Events like PennApps have raised the bar for anyone that wants to encourage developers to build useful and interesting applications with their data and APIs – governments included.

We must enhance the quality of government developer portals, and we must work harder (and faster) to develop shared standards for government data and APIs. Most importantly, we have to do more to share tips, tricks and best practices between governments. There are some tools out there to get governments started down the road of building a developer center that is impactful and engaging, but we must do more.

Developer centers are not just a mandate or requirement that we need to check of our “to do” list. Government developer portals should be the hubs around which we engage and communicate with developers, technologists and the broader data community.

In a subsequent post, I’ll share a checklist that I’m putting together with a list of basic elements that every government developer portal should have.

Stay tuned!

An SMS-Enabled Polling Locator

This is a great weekend for civic hacking.

Daylight Savings Time has given us an extra hour, advances in telephony application development have made it dead simple to build text messaging applications and Google has given us the Civic Information API.

With an election on Tuesday, I wanted to build a quick application that demonstrated the ease with which SMS apps can be built and the power of Google’s API.

The address of a polling place is both valuable and succinct – it’s the ideal kind of information to deliver through multiple communication channels. Text messaging (SMS) is a fairly ubiquitous communication channel, and in some cities – like Philadelphia – it’s an important way to engage with citizens that may face barriers to digital access.

The screencast above demonstrates how to use the script I developed using the Google Civic Information API and the Tropo telephony platform.

There are many ways to do this, and there are a large number of text messaging platforms and services to choose from, so if you want to use your extra hour this weekend to help people find their polling location pick the one you like best and get cracking.

It’s never been easier to build useful communication and messaging apps – in fact it’s getting easier every day. And with the richness of information available through APIs like Google’s Civic Information API, it’s never been easier to build an app that will help people get to their polling location.

Election day is just around the corner. Use your extra hour this weekend wisely…

“Phind It For Me” Live in Philly

Really excited to launch a new OpenGov project in Philadelphia – Phind It For Me.

The service is built on PHLAPI and the point data sets it houses. As such, one could understand why I’d be interested in enhancing the data sets currently in PHLAPI.

I’m really excited about this project – source code available on GitHub – and would love to see if there is an interest in launching in other cities with CouchDB-based geospatial data repositories, like Baltimore.

It’s built on the awesome new SMSified platform from Voxeo (disclaimer, I work there) and uses a Node.js module I built for working with the SMSified API.

As always, dear readers, any comments or feedback is welcomed.

Do head on over to the project website and check it out!

Experiments in Open Data: Baltimore Edition

A lot of my open gov energy of late has been focused on replicating a technique pioneered by Max Ogden (creator of PDXAPI) to convert geographic information in shapefile format into an easy to use format for developers.

Specifically, Max has pioneered a technique for converting shapefiles into documents in an instance of GeoCouch (the geographic -enabled version of CouchDB).

I was thrilled recently to come across some data for the City of Baltimore and since I know there are some open government developments in the works there, I decided to put together a quick screencast showing how open data – when provided in an easily used format – can form the basis for some pretty useful civic applications.

The screencast below walks through a quick demonstration of an application I wrote in PHP to run on the Tropo platform – it currently supports SMS, IM and Twitter use.

Just send an address in the City of Baltimore to one of the following user accounts along with a hashtag for the type of location you are looking for:

  • SMS: (410) 205-4503
  • Jabber / Gtalk:
  • Twitter: @baltimoreAPI

This demo application interacts with a GeoCouch instance I have running in Amazon EC2 – you can take a look at the data I populated it with by going to and accessing the standard CouchDB user interface. I haven’t really locked this instance down all that tight, but there really isn’t anything in it that I can’t replace.

Locate places in Baltimore via SMS

Besides, one of the nice things about this technique is how easy it is to convert data from shapefile format and populate a GeoCouch instance. Hopefully others with GIS datasets will look at this approach as a viable one for providing data to developers. (If anyone has some shapefiles for the City of Baltimore and you want to share them, let me know and I’ll load them into

There are a number of people in Baltimore pushing for an open data program from their city government, and I have heard that there are some really cool things in the pipeline. I can’t wait to see how things develop there, and I want to do anything I can to help.

Hopefully, this simple demo will be useful in illustrating both the ease with which data can be shared with developers and the potential benefit that applications built on top of open data can hold for municipalities.

UPDATE (4/18/2011): I’ve actually replicated all of the Baltimore data from the EC2 instance discussed in this blog post to the new Iris Couch instance. Iris Couch is by far the easiest way to get started using CouchDB, and Couch’s replication feature makes it easy to move data into an Iris Couch instance.

Building Multichannel Transit Apps with Tropo

This post is the third in a series about building an open source transit data application using GTFS data from the Delaware Transit Corporation.

In the first post, I described how to download the State of Delaware’s transit data and populate a MySQL database with it.

In the previous post, I walked through a process of setting up stored procedures for querying the transit data and setting up a LAMP application environment.

Now we’re ready to write code for our transit app!

Choosing a Platform

One of the most under appreciated developments that has accompanied the increasing amount of government data that has become available in open formats is the vast array of new tools now available for developers to use. I’ve talked about this a lot in the past but it bears repeating – it has never been easier to build sophisticated, multi-channel communication applications than it is now.
The number of options open to developers is truly exciting, but there are some platforms that rise above the rest in terms of ease of use and in what they enable developers to do. For this project, I will use the Tropo WebAPI platform.

The Tropo WebAPI has a number of advantage that will come in handy for our transit app project (and any other projects you’ve got in the works). You can write a Tropo app in one of several popular scripting and web development languages – Ruby, Python, PHP, C# and JavaScript (Node.js). There are libraries available for each language that make it easy to build Tropo apps and to integrate with the Tropo API. (Disclaimer – I’ve worked on several of these libraries.)

In addition, the real magic that Tropo brings to the table is the ability to serve users on multiple communication channels (phone, IM, SMS, Twitter) from a single code base. This is especially important for an application meant to service transit riders. These users may not have the luxury of sitting in front of a desktop computer in order to look up information on a bus route or schedule. They are much more likely to be traveling and using some sort of phone or mobile device. The Tropo WebAPI is perfect for our needs.

Vivek Kundra, the former CIO of the District of Columbia and current CIO of the United States, has described the effort by governments to release data in open formats as “the democratization of data” – these efforts make previously hard to get, or hard to use data available for everyone.

I like to describe platforms like Tropo and the various libraries that are available to use with it as “the democratization of application development” – these tools make building powerful communication apps simple for anyone who understands web development.

Building our Transit App

Before we can build our application, we need to decide what it will do.

For our purposes, this has already been determined by the stored procedures we built in the last post. Our transitdata database has 2 stored procedures – one to return the nearest bus stops to a specific address or location, and one to return the next bus departure times from a specific bus stop.

However, this series of posts is meant to inspire readers to build their own applications – now that you have transit data in a powerful relational database like MySQL you can query it any way you like. In addition, the SQL scripts and steps developed for this series of posts can certainly be used with the data from any other transit agency that uses the GTFS format. There are lots. Use your imagination – build whatever you find useful.

So now that we have some idea of what we want our application to do, we need to select a development language. It will probably come as no surprise that for this example I’m going to use the PHP scripting language and the PHP Library for the Tropo WebAPI. PHP is a good match for Linux, Apache and MySQL – all technologies we used in the previous entries in this series of blog posts.

If you want some more detailed information on building PHP applications that run on the Tropo WebAPI platform, you can review a separate series of blog posts on this issue here.

To get the PHP Library for the Tropo WebAPI, you can download it and unpack on your web server, or simply clone the Github repo.

Once you do that, you can grab the code for our demo application from GitHub as well.

In order to test this application, you’ll need to sign up for a free Tropo account – you can do that here. Once you are signed up, go to the Applications section in your Tropo account and set up a new WebAPI application that points to the location of our PHP script on your web server. You can see more detailed information on setting up a Tropo account here.


Note – You’ll also need an API key from Google Maps for geocoding addresses – get one here. Change the following line in the application to include your Google API key:

define("MAPS_API_KEY", "your-api-key-goes-here");

Once your Tropo account and application are set up, you can add as many different contact methods as you like – your Tropo application is automatically provisioned a Skype number, a SIP number and an iNUM.

To illustrate how our transit app will work, I’ve gone ahead and assigned a Jabber IM name to my app – Add this to your friends/user list in Google chat and you can use the app I’ve set up. Here’s what it looks like in my IM client:


As you can see, my first IM to sends the address of a building in Downtown Wilmington (actually, a building I used to work in). The app responds with the three closest bus stops and the distance (in miles) to each.

I then send the number of the bus stop I am interested in. The app responds with the next three buses to leave that stop, the route served by each and the number of minutes before each departs.

How cools is that!

I could very easily make this application more sophisticated, so that it it delivers content tailored to specific channels (i.e., IM vs. phone) but I want to keep things simple for now.

In the next blog post of this series, we will introduce some additional tools, including Google Maps and the new hotness in cloud telephony – Phono.

Stay tuned!