Towards Ethical Algorithms

Old tools & new challenges for governments

There is a common misconception that data-driven decision making and the use of complex algorithms are a relatively recent phenomenon in the public sector. In fact, making use of (relatively) large data sets and complex algorithms has been fairly common in government for at least the past few decades.

As we begin constructing ethical frameworks for how data and algorithms are used, it is important that we understand how governments have traditionally employed these tools. By doing so, we can more fully understand the challenges governments face when using larger data sets and more sophisticated algorithms and design ethical and governance frameworks accordingly.

Read More

Building the Government Data Toolkit


Flickr image courtesy of Flickr user bitterbuick

We live in a time when people outside of government have better tools to build things with and extract insights from government data than governments themselves.

These tools are more plentiful, more powerful, more flexible, and less expensive than pretty much everything government employees currently have at their disposal. Governments may have exiting relationships with huge tech companies like Microsoft, IBM, Esri and others that have an array of different data tools — it doesn’t really matter.

In the race for better data tools, the general public isn‘t just beating out the public sector, its already won the race and is taking a Jenner-esque victory lap.

This isn’t a new trend.

Read More

Thinking Small on Civic Tech

Designing simple systems is one of the great challenges of Government 2.0. It means the end of grand, feature-filled programs, and their replacement by minimal services extensible by others.

— Tim O’Reilly, Open Government

The original idea of Government as a Platform is now almost a decade old. In the world of technology, that’s a long time.

In that time, people working inside and outside of government to implement this idea have learned a lot about what works well, and what does not. In addition, we’ve seen some significant changes in the world of technology over the past decade or so, and the way the we develop solutions (both in the world of civic tech, and outside of it) have changed fairly dramatically.

The power of the original idea for Government as a Platform continues to echo in the world of civic tech and open data. I have no doubt that it will for a long time to come.

But in 2015 what does Government as a Platform actually look like, and what should it look like going forward into the future? What are its component parts? How does it manifest in terms of actual infrastructure, both inside and outside government?

And, most importantly, who controls this infrastructure and has a say in how it is shaped and used.

Read More

Turning Government Web Content Into APIs.

Inspired by a recent Open City project that repurposes data on sewage in the Chicago River, I wanted to work through a quick example of turing a web page that houses useful information on water quality in Philly into an API.

Here is a quick screencast showing how easy it is to take information rendered as part of a standard HTML page and turn it into a useful JSON API.

It goes without saying that as more and more data producers in government learn about how data can be repurposed by others – both inside and outside of government – we’ll see more and more valuable information exposed as structured data and APIs. Viva #opendata!

Until then, there are some very powerful tools available for turing web content into powerful, reusable APIs.

5 Essential Open Data Tools

Every data wrangler has their own list of favorites – the go to tools that they use when they need to work with data.

If you need to clean, transform, or mashup data or if you are working with a data set that will form the basis for an application, here is a list of tools that can make life easier for you.

  • OpenRefine – I don’t think there is a better tool for cleaning messy data than OpenRefine. One of my favorite features is the ability to add new columns to a data set based on data in an external web service.
  • jq – I see a lot of JSON in my job, and its exceptionally easy to use JSON data with a tool like this one. For example, here is a simple jq recipe for extracting a list of licensed pawn shops in Philadelphia to a CSV file.
  • csvkit – CSV is another format I see almost everyday, and using csvkit makes it simple. My favorite utility – though I don’t use it often – is csvsql. use this handy utility to generate SQL insert statements and easily create a relational database from a CSV file.
  • Unix shell – jq and csvkit are both command line tools, and the Unix shell is the place where I spend a lot of time working with data. Without getting into a Windows vs. *nix war, there is simply no better collection of utilities for working with text files than those that can accessed via the shell. Tools like curl, grep, sed, awk, cut and a host of others are enormously useful on their own, or in combination with tools like jq and csvkit.
  • CartoDB – pretty much the easiest way to create a web-based map from an open data set. There’s even an API for building apps on top of the data you have in your CartoDB account. Enough said.

Note, my background is in software development so the list of favorites above probably reflects my own professional biases. Someone who works primarily as a data scientist might have a completely different list of favorite tools.

What’s your favorite tool for working with data?

Onboarding Civic Hackers

Earlier this week, I had the pleasure of attending a civic hacking event jointly organized by Code for Philly and Girl Develop It Philly. The event had a tremendously good turnout – over 50 people by my count – making it one of the larger events Code for Philly has organized in recent months.


The mission of Girl Develop It is to empower women to learn software development, and as a result there were a good number of people at the event being introduced to civic hacking for the first time. This got me thinking about ways to onboard people new to civic hacking (and people new to coding) into civic technology projects.

None of these is new, but here are five ideas I came up with after the event:

Data Liberation – the foundation of civic hacking project is open data, and far too much of the data civic hackers need is locked up in broken websites and unusable formats. Helping to break some of this data free can be a tremendous benefit to open data users and civic hacking projects.

Documentation – far too many open source and civic hacking projects go without proper documentation to help other developers contribute and to support end users. Helping to create or expand documentation for a project can be critical to helping it succeed.

User Testing – Organizing and conducting end user testing for civic technology projects is sadly rare. There are some efforts underway to change this but in order for civic hacking projects to improve and succeed we need real feedback from mainstream users.

Outreach – One legitimate criticism of civic apps is that too few people know they exist. There are efforts working to change this, like Apps for Philly (still in its infancy) – a site that lists a host of different civic technology apps that are available for users. Adding new projects to this listing (and others like it) will both help these projects succeed and give the person doing it a much clearer sense of the civic technology landscape.

Helper Libraries – a great way to get comfortable writing code and to help out a civic technology project is to write helper libraries for projects with APIs. At the Apps for Philly Transit Hackathon, one project utilized recently released data from the City of Philadelphia on bicycle thefts. The lead developer created a new API for this data to enable other projects to use it. Building new client libraries in a range of different languages would be a great way to support other developers that want to incorporate bike theft data into their projects, and to get some hands on experience writing code.

There are so many ways to contribute to open source projects and to help support civic hacking efforts – these are just a few.

We need more great events like the one organized by Code for Philly and Girl Develop it Philly to bring together all of the talented people we have in our city to work on these important projects.

The Lesson of PennApps

A couple of weeks ago, I attended the most recent PennApps hackathon – a biannual college hackathon in Philadelphia that has grown from somewhat humble beginnings a few years ago to one of the largest college hackathons in the world.

Penn Apps logo

Attendance at the event has swelled to over 1,000 participants from colleges across the country, as well as several international teams. I’ve been to PennApps 4-5 times in the past few years and it has been remarkable to see it grow. The last several I have attended, I brought with me colleagues from city government – some whom had never before been to a hackathon.

What hasn’t changed over the many installments of PennApps is the presentation of sponsor APIs at the kick off event. After a brief introduction by the organizers, the many technology companies that sponsor the event show their wares to participants. This almost always involves a short description and demo of an API or SDK that participants can use at the event to build something.

These presentations are witty, engaging and fun – they have to be. There are dozens of sponsors for the event and each is angling to encourage developers to use their tool or platform to build a working prototype by the end of the weekend. This is how companies built around APIs raise brand awareness. Increasingly, the process of building software has come to revolve around leveraging third-party platforms and APIs. This has changed that way that software development – particularly web development – happens, as well as the expectations of developers.

Venmo makes their pitch at PennApps

Most of the sponsors at an event like this one will offer some sort of additional incentive for developers that use their services – free credits, t-shirts, swag, etc. At a minimum, though, their services are easy to use and well documented. Those that are brave enough may even try a “live coding” demo – building a working application using their API or platform in just the few short minutes allotted to each sponsor presentation. When done successfully, this can help drive home the point to prospective users that a platform or API is easy to use.

Every time I attend this event with my colleagues from city government I say – “This is what governments need to do. We need to present ourselves to prospective users as if we were an API company. The same standards of quality should apply.”

It is increasingly common – and encouraging – to see governments publish developer portals. A number of different federal agencies do this, as do large cities like New York, Chicago and Philadelphia.

The truth is – if we’re being honest with ourselves – that most of them are not very good, particularly when compared with the offerings of private companies. Many don’t have common elements like an API console, code samples & tutorials, helper libraries or a discussion forum.

If we want to encourage developers to use open government data we need to be realistic with ourselves about what developers have come to expect in terms of an API offering. Events like PennApps have raised the bar for anyone that wants to encourage developers to build useful and interesting applications with their data and APIs – governments included.

We must enhance the quality of government developer portals, and we must work harder (and faster) to develop shared standards for government data and APIs. Most importantly, we have to do more to share tips, tricks and best practices between governments. There are some tools out there to get governments started down the road of building a developer center that is impactful and engaging, but we must do more.

Developer centers are not just a mandate or requirement that we need to check of our “to do” list. Government developer portals should be the hubs around which we engage and communicate with developers, technologists and the broader data community.

In a subsequent post, I’ll share a checklist that I’m putting together with a list of basic elements that every government developer portal should have.

Stay tuned!

An SMS-Enabled Polling Locator

This is a great weekend for civic hacking.

Daylight Savings Time has given us an extra hour, advances in telephony application development have made it dead simple to build text messaging applications and Google has given us the Civic Information API.

With an election on Tuesday, I wanted to build a quick application that demonstrated the ease with which SMS apps can be built and the power of Google’s API.

The address of a polling place is both valuable and succinct – it’s the ideal kind of information to deliver through multiple communication channels. Text messaging (SMS) is a fairly ubiquitous communication channel, and in some cities – like Philadelphia – it’s an important way to engage with citizens that may face barriers to digital access.

The screencast above demonstrates how to use the script I developed using the Google Civic Information API and the Tropo telephony platform.

There are many ways to do this, and there are a large number of text messaging platforms and services to choose from, so if you want to use your extra hour this weekend to help people find their polling location pick the one you like best and get cracking.

It’s never been easier to build useful communication and messaging apps – in fact it’s getting easier every day. And with the richness of information available through APIs like Google’s Civic Information API, it’s never been easier to build an app that will help people get to their polling location.

Election day is just around the corner. Use your extra hour this weekend wisely…

“Phind It For Me” Live in Philly

Really excited to launch a new OpenGov project in Philadelphia – Phind It For Me.

The service is built on PHLAPI and the point data sets it houses. As such, one could understand why I’d be interested in enhancing the data sets currently in PHLAPI.

I’m really excited about this project – source code available on GitHub – and would love to see if there is an interest in launching in other cities with CouchDB-based geospatial data repositories, like Baltimore.

It’s built on the awesome new SMSified platform from Voxeo (disclaimer, I work there) and uses a Node.js module I built for working with the SMSified API.

As always, dear readers, any comments or feedback is welcomed.

Do head on over to the project website and check it out!

Experiments in Open Data: Baltimore Edition

A lot of my open gov energy of late has been focused on replicating a technique pioneered by Max Ogden (creator of PDXAPI) to convert geographic information in shapefile format into an easy to use format for developers.

Specifically, Max has pioneered a technique for converting shapefiles into documents in an instance of GeoCouch (the geographic -enabled version of CouchDB).

I was thrilled recently to come across some data for the City of Baltimore and since I know there are some open government developments in the works there, I decided to put together a quick screencast showing how open data – when provided in an easily used format – can form the basis for some pretty useful civic applications.

The screencast below walks through a quick demonstration of an application I wrote in PHP to run on the Tropo platform – it currently supports SMS, IM and Twitter use.

Just send an address in the City of Baltimore to one of the following user accounts along with a hashtag for the type of location you are looking for:

  • SMS: (410) 205-4503
  • Jabber / Gtalk:
  • Twitter: @baltimoreAPI

This demo application interacts with a GeoCouch instance I have running in Amazon EC2 – you can take a look at the data I populated it with by going to and accessing the standard CouchDB user interface. I haven’t really locked this instance down all that tight, but there really isn’t anything in it that I can’t replace.

Locate places in Baltimore via SMS

Besides, one of the nice things about this technique is how easy it is to convert data from shapefile format and populate a GeoCouch instance. Hopefully others with GIS datasets will look at this approach as a viable one for providing data to developers. (If anyone has some shapefiles for the City of Baltimore and you want to share them, let me know and I’ll load them into

There are a number of people in Baltimore pushing for an open data program from their city government, and I have heard that there are some really cool things in the pipeline. I can’t wait to see how things develop there, and I want to do anything I can to help.

Hopefully, this simple demo will be useful in illustrating both the ease with which data can be shared with developers and the potential benefit that applications built on top of open data can hold for municipalities.

UPDATE (4/18/2011): I’ve actually replicated all of the Baltimore data from the EC2 instance discussed in this blog post to the new Iris Couch instance. Iris Couch is by far the easiest way to get started using CouchDB, and Couch’s replication feature makes it easy to move data into an Iris Couch instance.