Open Data Beyond the Big City

This is an expanded version of a talk I gave last week at the Code for America Summit.

An uneven future

“The future is already here – it’s just not evenly distributed.”
William Gibson. The Economist, December 4, 2003

The last time I herd Tim O’Reilly speak was at the Accela Engage conference in San Diego earlier this year. In his remarks, Tim used the above quote from William Gibson – it struck me as a pretty accurate way to describe the current state of open data in this country.

Open data is the future – of how we govern, of how public services are delivered, of how governments engage with those that they serve. And right now, it is unevenly distributed. I think there is a strong argument to be made that data standards can provide a number of benefits to small and mid-sized municipal governments and could provide a powerful incentive for these governments to adopt open data.

One way we can use standards to drive the adoption of open data is to partner with companies like Yelp, Zillow, Google and others that can use open data to enhance their services. But how do we get companies with 10s and 100s of millions of users to take an interest in data from smaller municipal governments?

In a word – standards.

Why do we care about cities?

When we talk about open data, it’s important to keep in mind that there is a lot of good work happening at the federal, state and local levels all over the country. Plenty of states and even counties doing good things on the open data front, but for me it’s important to evaluate where we are on open data with respect to cities.

States typically occupy a different space in the service delivery ecosystem than cities, and the kinds of data that they typically make available can be vastly different from city data. State capitols are often far removed from our daily lives and we may hear about them only when a budget is adopted or when the state legislature takes up a controversial issue.

In cities, the people that represent and serve us us can be our neighbors – the guy behind you at the car wash, or the woman who’s child is in you son’s preschool class. Cities matter.

As cities go, we need to consider carefully that importance of smaller cities – there are a lot more of them than large cities and a non-trivial number of people live in them.

If we think about small to mid-sized cities, these governments are central to providing a core set of services that we all rely on. They run police forces and fire services. They collect our garbage. They’re intimately involved in how our children are educated. Some of them operate transit systems and airports. Small cities matter too.

Big cities vs. small cities on open data

So if cities are important – big and small – how are they doing on open data? It turns out that big cities have adopted open data with much more regularity than smaller cities.

If we look at data from the Census Bureau on incorporated places in the U.S. and information from a variety of sources on governments that have adopted open data policies and making open data available on a public website, we see the following:

Big Cities:

  • 9 of the 10 largest US cities have adopted open data.
  • 19 of the top 25 most populous cities have adopted open data.
  • Of cities with populations > 500k, 71% have adopted open data.

Small Cities:

  • 256 incorporated places in the U.S. with populations between 500k – 100k.
  • Only 39 have open data policy or make open data available.
  • A mere 15% of smaller cities have adopted open data.

The data behind this analysis is here. As we can see, it shows a markedly different adoption rate for open data between large cities (those with populations of 500,000 or more) and smaller cities (those with populations between 100,000 and 500,000).

Why is this important?

We could chalk up this difference to the fact that big cities simply have more data. They may have more people asking for information, which can drive the release of open data. They have larger pools of technologists, startups and civic hackers to use the data. They may have more resources to publish open data, and to manage communities of users around that data.

I don’t know that there is one definitive answer here – there’s ample room for discussion on this point.

We should care about this because – quite simply – a lot of people call smaller cities home. If we add up the populations of the 256 places noted above with populations between 100,000 and 500,000, it actually exceeds the combined population of the 34 largest cities (with populations of 500,000 or more) – 46,640,592 and 41,155,553 respectively. Right now these people are potentially missing out on the many benefits of open data.

But more than simple math, if one of the virtues of our approach to democracy in this country is that we have lots of governments below the federal level to act as “laboratories of democracy” then we’re missing an opportunity here. If we can get more small cities to embrace open data, we can encourage more experimentation, we can evaluate the kinds of data that these cities release and what people do with it. We can learn more about what works – and what doesn’t.

In addition, we now know that open data is one tool that can be used to help address historically low trust in government institutions. It’s not hard to find smaller governments in this country that could use all the help they can get in repairing relations with those they serve.

How do we fix this?

There’s at least a few things we can do to address this problem.

First, we need more options for smaller governments to release open data. We’re not going make progress in getting smaller governments to adopt open data if the cost of standing up a data portal has the same budget impact as the salary for a teacher, or a cop, or a firefighter, or a building inspector – I just don’t think that’s sustainable.

Equally important, we need to work on developing useful new data standards. This won’t always be easy, but it’s important work and we need to do it.

open data standards tweet

For smaller cities without the deep technology, journalism and research communities that can help drive open data adoption, data standards are a way to export civic technology needs to larger cities. I believe they are critical to driving adoption of open data in the many small and midsized cities in this country.

We’ve already seen what open data looks like in big cities, and they are already moving to take the next steps in the evolution of their open data programs – but smaller cities risk getting left behind.

They next frontier in open data is in small and mid-sized cities.

What if We’re Doing it Wrong?

Ever since the botched launch of, procurement reform has become the rallying cry of the civic technology community.

There is now considerable effort being expended to reimagine the ways that governments obtain technology services from private sector vendors, with an emphasis being placed on new methods that make it easier for governments to engage with firms that offer new ideas and better solutions at lower prices. I’ve worked on some of these new approaches myself.

The biggest danger in all of this is that these efforts will ultimately fail to take hold – that after a few promising prototypes and experiments governments will revert to the time honored approach of issuing bloated RFPs through protracted, expensive processes that crowd out smaller firms with better ideas and smaller price tags.

I worry that this is eventually what will happen because far too much time, energy and attention is focused on the procurement process while other, more fundamental government processes with a more intimate affect on how government agencies behave are being largely ignored. The procurement process is just one piece of the puzzle that needs to be fixed if technology acquisition is to be improved.

Right now, the focus in the world of civic technology is on fixing the procurement process. But what if we’re doing it wrong?

Things Better Left Unsaid

During the eGovernment wave that hit the public sector in the late 90’s to early 2000’s, tax and revenue collection agencies were among the first state agencies to see the potential benefits of putting services online. I had the good fortune to work for a state revenue agency around this time. My experience there, when the revenue department was aggressively moving its processes online and placing the internet at the center of its interactions with citizens, permanently impacted how I view technology innovation in government.

It’s hard for people to appreciate now, but prior to online tax filing state tax agencies would get reams and reams of paper returns from taxpayers that needed to be entered into tax processing systems, often by hand. Standard practice at the time was to bring on seasonal employees to do nothing but data entry – manually entering information from paper returns into the system used to process returns and issue refunds.

The state I worked for at the time had a visionary director that embraced the internet as a game changer in how people would file and pay taxes. Under his direction, the revenue department rolled out innovative programs to fundamentally change the way that taxpayers filed – online filing was implemented for personal and business taxpayers, and the department worked with tax preparers to implement a new system that would generate a 3D bar code on paper returns (allowing an entire tax return and accompanying schedules to be instantly captured using a cheap scanning device).

When these new filing options were in place, the time to issue refunds plummeted from weeks to days, and most personal income taxpayers saw their refunds issued from the state in just a couple of days. By this time, I had moved to the Governor’s office as a technology advisor and was leading an effort to help state departments move more and more services online. I wanted to use the experience of the revenue department to inspire others in state government – to tout the time and cost savings of moving existing paper processes to the internet, making them faster and cheaper.

When I asked the revenue director for some specifics on cost savings that I could share more broadly, his response could not have been further from what I expected.

He told me rather bluntly that he didn’t want to share cost saving estimates from implementing web-based services with me (or anyone else for that matter). Touting costs savings meant an eventual conversation with the state budget office, or questions in front of a legislative committee, about reducing allocations to support tax filing. The logic would go something like this – if the revenue department was reducing costs by using web-based filing and other programs, then the savings could be shifted to other department and policy areas where costs were going up – entitlement programs, contributions to cover the cost of employee pensions, etc.

All too often, agencies that implement innovative new practices that create efficiencies and reduce costs see the savings they generate shifted to other, less efficient areas where costs are on the rise. This is just one aspect of the standard government budgeting process that works against finding new, innovative ways for doing the business of government.

Time to Get Our Hands Dirty

A fairly common observation after the launch of is that governments need to think smaller when implementing new technology projects. But at the state and local level, there are actually some fairly practical reasons for technology project advocates to “think big,” and try and get as big a piece of the budget pie as they can.

There is the potential that funding for the next phase of a “small” project might not be there when a prototype is completed and ready for the next step. From a pure self-interest standpoint, there are strong incentives pushing technology project advocates to get as much funding allocated for their project as possible, or run the risk that their request will get crowded out by competing initiatives. Better to get the biggest allocation possible and, ideally, get it encumbered so that there are assurances that the funding is there if things get tight in the next budget cycle.

In addition, there are a number of actors in the budget process at all levels of government (most specifically – legislators) who equate the size of a budget allocation for a project with its importance. This can provide another strong incentive for project advocate to think big – in many cities and states, funding for IT projects is going to compete with things like funding for schools, pension funding, tax relief and a host of other things that will resonate more viscerally with elected officials and the constituencies they serve. This can put a lot of pressure on project advocates to push for as much funding as they can. There’s just too much uncertainty about what will happen in the next budget cycle.

Its for all of these reasons that I think it’s time for advocates of technology innovation in government to get their hands dirty – to roll up our sleeves and work directly with elected officials and legislators to educate them on the realities of technology implementation and how traditional pressures in the budget process can work to stifle innovation. There are some notable examples of legislators that “get it” – but we’ve got yeoman’s work to do to raise the technology IQ of most elected officials.

Procurement reform is one piece of the puzzle, but we’ll never get all the way there unless we address the built in disincentives for government innovation – those that are enforced by the standard way we budget public money for technology projects (and everything else). We’re having conversations in state houses and city halls across the country about the future costs of underfunding pensions, but I don’t think we’re having conversations about the dangers of underfunding technology with the same degree of passion.

Time for us to wade into the morass and come back with a few converts. We’ve got work to do.

Better Licensing For Open Data

It’s really interesting to see so many governments start to use GitHub as a platform for sharing both code and data. One of the things I find interesting, though, is how infrequently governments use standard licenses with their data and app releases on GitHub.

Why no licenses?

I’m as guilty as anyone of pushing government data and apps to GitHub without proper terms of use, or a standard license. Adding these to a repo can be a pain – more often than not, I used to find my self rooting around in older repos looking for a set of terms that I could include in a repo I wanted to create and copying it. This isn’t a terrible way ensure that terms of use for government data and apps stay consistent, but I think we can do better.

Before leaving the City of Philadelphia, I began experimenting with a new approach. I created a stand-alone repository for our most commonly used set of terms & conditions. Then, I added the license to a new project as a submodule. With this approach, we can ensure that every time a set of terms & conditions is included with a repo containing city data or apps that the language is up to date and consistent with what is being used in other repos.

Adding the terms of use to a new repo before making it public is easy:

~$ git submodule add git:// license

This adds a new subdirectory in the parent repo named ‘license’ that contains a reference to the repo holding the license language. Any user cloning the repo to use the data or app, simply does (for purposes of demonstration, using this rep):

~$ git clone
~$ git submodule init
~$ git submodule update

The user can run git submodule update any time to get the very latest license language, which can change from time to time.

Github is an amazing platform for governments to use in sharing open data and fostering collaboration through releasing applications as open source projects.

I think it also provides some powerful facilities for associating licenses and terms & conditions with these releases – something every open source project needs to be sustainable and successful.

Some Tips on API Stewardship

Following up on my last post, and a recent trip to St. Paul Minnesota for the NAGW Annual Conference to talk about open data APIs, I wanted to provide a few insights for proper API stewardship for any government looking to get started with open data, or those that already have an open data program underway.

Implementing an API for your open data is not a trivial undertaking, and even if this is a function that you outsource to a vendor or partner it’s useful to understand some of the issues and challenges involved.

This is something that the open data team in the City of Philadelphia researched extensively during my time there, and this issue continues to be among the most important for any government embarking on an open data program.

In no particular order, here are some of the things that I think are important for proper API stewardship.

Implement Rate Limiting

APIs are shared resources, and one consumer’s use of an API can potentially impact anther consumer. Implementing rate limiting ensures that one consumer doesn’t crowd out others by trying to obtain large amounts of data through your API (that’s what bulk downloads are for).

If you want to start playing around with rate limiting for your API, have a look at Nginx – an open source web proxy that makes it super easy to implement rate limits on your API. I use Nginx as a reverse proxy for pretty much every public facing API I work on. It’s got a ton of great features that make it ideal for front ending your APIs.

Depending on the user base for your API, you may also want to consider using pricing as a mechanism for managing access to your API.

Provide Bulk Data

If the kind of data you are serving through your API is also the kind that consumers are going to want to get in bulk, you should make it available as a static – but regularly updated – download (in addition to making it available through your API).

In my experience, APIs are a lousy way to get bulk data – consumers would much rather get it as a compressed file they can download and use without fuss, and making consumers get bulk data through your API simply burdens it with unneeded traffic and ties up resources that can affect other consumers’ experience using your API.

If your serving up open data through your API, here are some additional reasons that you should also make this data available in bulk.

Use a Proxy Cache

A proxy cache sits in between your API and those using it, and caches responses that are frequently requested. Depending on the nature of the data you are serving through your API, it might be desirable to cache responses for some period of time – even up to 24 hours.

For example, an API serving property data might only be updated when property values are adjusted – either through a reassessment or an appeal by a homeowner. An API serving tax data might only be updated on a weekly basis. The caching strategy you employ with your open data API should be a good fit for the frequency with which the data behind it is updated.

If the data is only updated on a weekly basis, there is little sense in serving every single request to your API through a fresh call down the stack to the application and database running it. It’s more beneficial for the API owner, and the API consumer, if these requests are served out of cache.

There are lots of good choices for standing up a proxy cache like Varnish or Squid. These tools are open source, easy to use and can make a huge difference in the performance of your API.

Always Send Caching Instructions to API Consumers

If your API supports CORS or JSONP then it will serve data directly to web browsers. An extension of the cacheing strategy discussed above should address cache headers that are returned to browser-based apps that will consume data from your API.

There are lots of good resources providing details of how to effectively employ cache headers like this and this. Use them.

Evaluate tradeoffs of using ETags

ETags are related to the cacheing discussion detailed above. In a nutshell, ETags enable your API consumers to make “conditional” requests for data.

When ETags are in use, API responses are returned to consumers with a unique representation of a resource (an ETag). When the resource changes – i.e., is updated – the ETag for that resource will change. A client can make subsequent requests for the same resource and include the original ETag in a special HTTP header. If the resource has changed since the last request, the API will return the updated resource (with an HTTP 200 response, and the new ETag). This ensures that the API consumer always gets the latest version of a resource.

If the resource hasn’t changed since the last request, the API will instead return a response indicating that the resource was not modified (an HTTP 304 response). When the API sends back this response to the consumer, the content of the resource is not included, meaning the transaction is less “expensive” because what is actually sent back as a response from the API is smaller in size. This does not, however, meant that your API doesn’t expend resources when ETgas are used.

Generating ETags and checking them against those sent with each API call will consume resources and can be rather expensive depending on how your API implements ETags. Even if what gets sent over the wire is more compact, the client response will be slowed down by the need to match ETags submitted with API calls, and this response will probably always be slower than sending a response from a proxy cache or simply dipping into local cache (in instances where a browser is making the API call).

Also, if you are rate limiting your API does responses that generate an HTTP 304 count against an individual API consumer’s limit? Some APIs work this way.

Some examples of how ETags work using CouchDB – which has a pretty easy to understand ETags implementation – can be found here.


Did I miss something? Feel free to add a comment about what you think is important in API stewardship below.

The Promise and Pitfalls of Government APIs

Fresh off a week in San Diego for the annual Accela Engage conference (where Tim O’Reilly gave a keynote presentation) and some stolen hours over the weekend for hacking together an entry in the Boston HubHacks Civic Hackathon, I’ve got government APIs front of mind.

Getting to hear the Godfather of “Government as a Platform” speak in person is always a treat, and Tim was kind enough to share the awesome slide deck he used for his talk. The chance to follow up on an event like Engage with some heads down time to bang out a quick prototype for the City of Boston was a great opportunity to frame some of the ideas discussed at the conference.

For me, this quick succession of events got me thinking about both the promise and the pitfalls of government APIs.

APIs: The Promise

The thing I love the most about the Boston Civic Hackathon is the way the city approached it. Prior to the event, the organizers took time to clearly articulate issues the city was trying to address. Materials given in advance to participants provided exhaustive information about the permitting process and clearly listed the things the city needed help with. Additionally, a few experimental API endpoints were stood up for participants to use during the event.

These APIs weren’t the easiest to use but they were helpful in creating prototypes that would allow city leaders to see the possibilities of collaborating with outside developers. It should be noted that Code for Boston – the local Code for America Brigade – was heavily involved in the event. This was a smart move by the city to include the leadership in the local civic hacking movement in the event right from the start.


So, out of the gate, this event provided immediate tangible benefits for the city – without even one line of code being written. The city benefits immensely from the time and effort that went into describing and documenting the current permitting system and the many shortcomings it has. This is a process that far too few governments undertake, even when they are crafting expensive and elaborate RFPs.

There appeared to be a healthy level of participation, despite the fact that there was another hackathon happening in Boston on the same weekend, indicating that the message from City Hall was being taken seriously by the local technology community. In all, nine apps (including one of my own) were submitted for review – each of these submissions provides powerful insights for city officials into what is possible when governments leverage the talents of outside developers using an API.

But at the same time, I think this event helps to highlight some of the pitfalls that governments (particularly municipal governments) face when deploying APIs and moving towards government as a platform. These challenges can derail efforts to collaborate with local civic hackers to improve the quality of services that governments provide, so its important to understand what they are.

APIs: The Pitfalls

To it’s credit, the City of Boston took steps to create new APIs for it’s legacy permitting system – to allow civic hackers to create prototypes that can help illustrate what is possible. And, as it turns out, a good people want to take them up on this.

Boston is one of the most progressive cities in the country when it comes to engaging civic technologists. But building and managing a custom API can be a challenge for any government and it is an endeavor not to be undertaken lightly. In addition, along with the new role of managing a production-grade API for external development, governments face the relatively new challenge of building and managing developer communities around them. To put it lightly, this ain’t easy – particularly for governments that haven’t done it before.

It looks like Boston’s current vendor doesn’t supply a baked-in API for their permitting system, so the city stood up a few custom endpoints for developers to work with over the weekend. This is a great approach to support a weekend hacking event, but if the city is serious about coaxing developers into investing time and money building new civic apps on top of an API, the demands can increase dramatically.

Production APIs done right require stewardship – this includes ensuring adequate reliability, authentication, versioning and a host of other things that building a demo API does not. If developers perceive that an API is unstable or lacks proper stewardship, they won’t invest the time building services that take advantage of it.

Another potential issue for Boston is that – even if they are able to create and manage a robust API for developers – the API for their permitting system will likely differ from the APIs of other cities. So, they may not be able to leverage talent outside of those interested in building an app specifically for the City of Boston.

The more that cities can share common platforms and APIs, the more they can amplify the benefits of collaborating with outside developers – an app built in one city can more easily be deployed to another, making the benefits to developers that build apps exponentially greater.

It’s great to see the City of Boston actively organizing hackathons to solicit ideas for how government service can be improved. I hope this event, and others to come, can help focus attention on the significant issues governments face in developing and managing open APIs for civic hackers.

Turning Governments into Data Stewards

The civic entrepreneurs behind Open Counter recently launched a new service called Zoning Check that lets prospective businesses quickly and easily check municipal zoning ordinances to determine where they can locate a new business.

This elegantly simple app demonstrates the true power of zoning information, and underscores the need for more work on developing standard data specifications between governments that generate similar kinds of data.

In a recent review of this new app, writer Alex Howard contrasts the simple, intuitive interface of Zoning Check with the web-based zoning maps produced by different municipal governments. Zoning Check is obviously much easier to use, especially for its intended audience of prospective business owners. And while this certainly is but one of many potential uses for zoning information, it’s hard to argue with the quality of the app or how much different it is than a standard government zoning map.

But to me, more than anything else, this simple little civic application provides an object lesson in the need for governments to invest less time and resources building new citizen-facing applications themselves and more time and resources mustering the talents of outside developers that can build more effective citizen-facing apps better, faster and cheaper.

To do this, governments need to reimagine their place in the civic technology production chain. In short, governments need to stop being app builders and start becoming data stewards.

There are a number of reasons why the role of data steward is a better one for governments – most importantly, governments don’t typically make good bets on technology. They’re not set up to do it properly, and as a result its not uncommon to see governments invest in technology that quickly becomes out of date and difficult to manage. This problem is particularly acute in relation to web-based services and applications – which outside civic technologists are very good at building – because the landscape for developing these kinds of applications changes far too rapidly for governments to realistically stay current.

Governments that focus on becoming data stewards are better able to break out of the cycle of investing in technology that quickly becomes out of date. It is these governments that are moving to release open data and deploy APIs to enable outside developers to build applications that can help deliver services and information to citizens. But in addition to procurement and recruitment hurdles that make it difficult for governments to get the technology of citizen-facing apps right, governments may also lack the proper perspective to develop targeted applications that expertly solve the problem of a specific class of users.

The truth of it is this – even if the processes by which businesses find out where they can locate, and what permitting and licensing requirements they need to comply with are terrible, there typically isn’t much they can do about it. Government’s lack proper incentives to get apps like this right because no one is competing with them to provide the service. If government’s change their role to that of a data steward, they can foster the creation of multiple apps that can deliver information to users in a much more effective way. Assuming the role of data steward would set up a competitive dynamic that would foster better interfaces to government information.

Look at what happened in Philadelphia when the city released crime data in highly usable formats – the city went from having one mediocre view of crime data that was developed with the sanction of the city to having a host of new applications developed by outside partners, each providing a new an unique view of the data that the city’s app simply did not provide.

The city even incorporated one of the apps built by outside developers into the official site for the Philadelphia Police Department.

Zoning Check is a great app to help center this conversation, and highlight the benefits that governments can reap if they work to transition way from being app builders and towards becoming true data stewards.

The Hacker Ethos and Better Cities

The thing I’ve always loved about hackathons is how they make it possible for anyone to build something that can help fix a problem facing a neighborhood, community or city.


Going to a hackathon isn’t like going to a government-sponsored meeting, or legislative hearing – those are places where people offer testimony to others, who may or may not take the advice given and implement some policy or legislative action. Hackathons are where people go to build actual solutions that help fix real problems.

The hacker ethos attracts people who don’t like layers of bureaucracy between the problems they see around them and the solutions that want to implement. We live in a time when it has never been easier for people without title, station or office to affect real change in the lives of people in their neighborhoods – to build solutions to fix problems they care about. This is an attractive draw for people that want to make a difference and its why the number of hackathons has grown in recent years, and continues to grow.

I see these same sentiments in an exciting project developed by Kristy Tillman and Tiffani Bell. They built the Detroit Water Project to help Detroit residents in danger of having their water service cut off get paired up with people that can make a payment (or partial payment) on their behalf.

This is the kind of project I would expect to see at a hackathon – it has very few rough edges but looks like it was put together rapidly. It effectively leverages powerful, cheap online tools like Google forms and social media to engage with people that want to get involved.

And it is absolutely brilliant in its simplicity and effectiveness.

Here’s the elevator pitch – there are folks in the City of Detroit (a city facing significant challenges) in danger of having their water service cut off because they are unable to pay their bill. This is an issue affecting thousands of people in real need and galvanizing a movement to help prevent it. The Detroit Water Project enables people anywhere in the country to help with just a few mouse clicks and at the cost of a night out on the town. Boom.

This is how people with the hacker ethos want to invest their time, talents and energy. They are surfacing the question that more people need to step up and help answer – are we going to sit by and let cities like Detroit crumble, or are we going to get off our asses and pitch in?

This project wasn’t built at a hackathon, but it’s everything a hackathon project can be (and should be). Kristy and Tiffani are hackers – and I mean that as the highest compliment I can pay someone.

This is how our cities are going to get better.

[Note - water icon courtesy of the Noun Project.]

Making FOIA More Like Open Data

The Freedom of Information Act, passed in 1966 to increase trust in government by encouraging transparency, has always been a pain in the ass. You write to an uncaring bureaucracy, you wait for months or years only to be denied or redacted into oblivion, and even if you do get lucky and extract some useful information, the world has already moved on to other topics. But for more and more people in the past few years, FOIA is becoming worth the trouble.

The Secret to Getting Top Secret Secrets, by Jason Fagone.

I’ve always thought that the FOIA process was an important part of a healthy open data program. That may seem like an obvious thing to say, but there are lot of people involved in the open data movement who either have limited exposure to FOIA or just enough exposure to truly to loath it.

In addition, the people inside government who are responsible for responding to FOIA requests may have very different feelings about releasing data than those that are part of an open data program.


There are lots of reasons why, for advocates of open data, the FOIA process is suboptimal. A number of them are discussed in a recent blog post by Chris Whong, an open data advocate in New York City and a co-captain of the NYC Code for America Brigade, who FOIA’d the NYC Taxi & Limousine Commission for bulk taxi trip data.

Chris’ post details many of things that open data advocates dislike about the FOIA process. It’s an interesting read, especially if you don’t know how the FOIA process works.

However, another more serious shortcoming of the FOIA process became obvious almost immediately after the taxi trip data was posted for wider use. It turns out that the Taxi & Limousine Commission had not done a sufficient job depersonalizing the data, and the encryption method used to obscure the license number of taxi drivers and their medallion number was easy to circumvent with moderate effort.

It’s obvious that the Taxi Commission tried to obscure this personal data in the files it released and to also make sure the data was as usable as possible by the person who requested it. Striking this balance can be tricky, and it’s actually not uncommon for data released through FOIA requests to have information that may be viewed as sensitive in hindsight.

I think one of the reasons this happens with data released through FOIA is that the process is not usually coupled tightly enough with the open data review process. I think we can make FOIA better (and, by extension, make the open data process better) by running more FOIA requests through the vetting and review process used to release open data.

Outcome vs. Process

In my experience, there is often very little connection between the process for responding to FOIA requests and the open data release process. Beyond reviewing FOIA requests in the aggregate to see if there are opportunities for bulk data releases, the FOIA process and the open data release process often happen independently of one another. This is certainly the case in the City of Philadelphia.

In Philly, open data releases are coordinated by the Chief Data Officer in the Office of Innovation and Technology. FOIA requests – or Right to Know Requests as they are known in the Commonwealth of Pennsylvania – are handled by staff in the Law Department, or personnel that have been identified as Right to Know Officers for their specific department.

These requests almost always get treated as one-off tasks, never to be repeated again. Even though requests for the same data may be made at a later date, I’ve never seen the people working on FOIA requests in Philly take the approach of making their work to respond to these requests repeatable.

The problem with a bifurcated approach to data releases like this is that it forces people to think of the work to respond to FOIA requests as disposable. Something that happens once – an outcome, instead of a process. Open data done correctly is about establishing a process – one that includes opportunities for review and feedback.

Toward Better FOIA Releases

Because FOIA is viewed as a one and done task, there is no opportunity to iteratively release data – if the release of NYC taxi trip data had been viewed as a process (particularly a collaborative one), the Taxi & Limousine Commission could have opted to be conservative in their initial release and then enhanced future releases based on actual feedback from real consumers of the data.

In Philadelphia, we employed a group called the Open Data Working Group to help review and vet proposed data releases. This is an interdisciplinary group from across different city departments which helped provide feedback and input on a number of important data releases that required depersonalization or redaction of of sensitive data – crime incidents, complaints filed against active duty police officers, etc.

Additionally, part of our release process involved reaching out to select outside data consumers to get feedback and help identify issues prior to broader release. Because we used GitHub for many of our data releases, we could set up private repos for our planned data releases and ask selected experts to help us vet and review by adding them as collaborators prior to making these data repos public.

Getting to Alignment

I think for a lot of amateurs, their alignment is always out.

Karrie Webb, professional golfer

When it comes to data releases, there is no substitute for experience – that’s why integrating FOIA releases into an existing open data release process can be so beneficial. Leveraging the process for reviewing open data releases can improve the quality of FOIA releases and bring these two critical elements of the open data process into closer alignment.

I’m hopeful that cities, particularly Philadelphia, will begin to see the merit of better aligning FOIA responses and open data releases.

Built to Fail

The great truism that underlies the civic technology movement of the last several years is that governments face difficulty implementing technology, and they generally manage IT assets and projects very poorly.

It can be tempting to view this lack of technology acumen as a symptom of a larger disfunction. Governments are thought to be large, plodding bureaucracies that do lots of things poorly – technology management and implementation are but one of the many things that governments do not do well.

But the challenges facing governments as it relates to technology are quite specific – there are a handful of processes, all easily identifiable, that work against the efficient adoption of technology by governments. Understanding why governments struggle to implement new technology requires us to understand what these factors are and why they negatively impact technology adoption so specifically.

The challenges that governments face in adopting new technology are not symptomatic of a larger disfunction – when it comes to the efficient adoption of technology, governments are built to fail.

The Government Procurement Process

Over the past year, primarily as a result of the botched rollout of the federal website, there has been significant attention given to the aspects of the government procurement process that hinder efficient technology adoption.

The process clearly does not work well for governments – and many would argue that it does not work well for technology vendors either. Typical government procurement rules shut out many small firms that may be well positioned to execute on a government IT project but that lack the organizational endurance to make it through the lengthy selection process.

The process is complex, costly and – most importantly – slow. To illustrate the magnitude of the issue, consider that in the City of Philadelphia the period between a contract award (when a vendor gets selected to work on a project) and the final execution of that contract (when the work actually begins) can take an average of four months for some projects. Some technology solutions have shorter release cycles than that.

This makes the government procurement process particularly ill suited (in its current form) as a means for governments to acquire technology. The pace at which new technologies mature is much more rapid, and the glacial pace at which the government procurement process moves can lock governments into outdated solutions.

The most under-appreciated characteristic of the procurement process is that it’s current design is largely intentional. Governments imbue the process with requirements and other stipulations that they hope will lead to outcomes that are deemed desirable. Each of these requirements adds to the complexity of the process and the burden of firms that choose to respond to government RFPs.

For example, almost every government has purchasing requirements for minority- and women-owned businesses, and many have requirements that local companies receive preference over firms from outside the jurisdiction. The objective is to drive more government procurement dollars to minority- and women-owned businesses and to local businesses that create local jobs and pay local taxes.

There are also larger, overarching values embedded in the procurement process. For example, fairness and transparency are values that inform requirements like the public posting of bids and related materials, ample public notice of vendor meetings, and the clear specification of when and how bids must be submitted.

Risk aversion is another value that impacts the complexity and cost of the public procurement process. It is this value that informs requirements like performance bonds, vendor insurance, scrutiny of company financial statements, and requirements for financial reserves—all things that seek to reduce the risk assumed by governments from engaging with a company to provide a good or service. Each of these requirements can make the procurement process more complex and burdensome for bidders, particularly smaller companies.

These features of the procurement process were designed with a specific intent, and few people would argue with the laudable goals they seek to encourage. Yet, one of the side effects of these requirements is that they make the process slower, more complex, and harder for smaller and more nimble firms to participate in.

In other words, the procurement process works largely as it was built to work, but its design makes it a lousy tool for acquiring technology.

The Public Budgeting Process

Perhaps more fundamental than the procurement process, the process used by governments to deliberate and adopt spending plans is deeply flawed as it relates to encouraging innovation generally and adopting new technologies specifically. There are built-in disincentives in the public budgeting process for agencies to demonstrate large efficiency gains that result in the need to outlay fewer dollars.

Public agencies that have unused allocations at the end of a fiscal year – money that is appropriated but not spent – inevitably see those dollars redirected to another agency or policy priority in the next budget cycle. Mayors, governors, and legislators see this as an effective way to allocate finite resources between competing interests. If an agency doesn’t spend down their entire budget allocation, they must not need it and a reduction is justified. In other words, the budget process works as government administrators have designed it to work – to reallocate resources away from agencies that don’t need them to agencies that do.

However, most agencies see this outcome as unfavorable – a diminution of their mission and a loss for the clientele that they serve and are advocates for. It can be a powerful disincentive for agencies to use new technologies to create efficiencies that result in cost savings.

Employee Recruitment & Retention

Most government employees – at the federal, state and local levels – are hired into an agency through a merit-based civil service system. The development of civil service systems, which are supposed to grant job opportunities on the basis of merit, were a response to widespread political patronage that was common long ago.

Most civil service systems have fairly rigid job classifications and salary structures, meant to provide transparency into what specific positions are paid and to place limits on the ability to reward specific employees because of political preference or for other reasons. There are typically requirements for public job postings, and advertisement of openings for specified period. Applicants are typically required to demonstrate fitness for a position and may often be required to take a civil service exam to demonstrate minimum competency on a particular subject.

Not unlike the procurement system, we imbue certain values into the civil service system in the hopes of fostering outcomes deemed favorable. For example, in the City of Philadelphia the children of police and fire personnel are given preference in the civil service process. This is not meant to suggest that granting such applicants a preference is inherently a bad thing, but (as with the procurement process) using the civil service system as a vehicle for fostering outcomes can come at the cost of making the process more lengthy and complex for all applicants.

Public sector salaries and benefits generally lag behind the private sector, and altering pay structures and adding new job classifications in response to changes in the world of technology can be difficult. This is particularly troubling for those interested in recruiting top IT talent into government because, unlike with many other kinds of government job types, governments are in direct competition with the private sector for IT workers. A “system administrator” or a “web developer” or a “project manager” requires largely the same kinds of skills and experience inside of government or outside.

Like the procurement process and the budget process, the civil service system works as it was designed to work. Unfortunately, the way that it works makes it ill suited for attracting and retaining a highly skilled IT workforce.

The Cost of Building to Fail

What’s particularly interesting is that even if the policies that underpin each of these processes doesn’t change at all in the next several years, the problems that governments face in adopting new technology will get steadily worse. That’s because the pace of change in the world of technology continues to accelerate – as it does, the difference between the rate at which new technologies mature and the rate at which governments can make budget or purchasing decisions, or reclassify jobs and salaries in response will get larger.

The old Hippocratic maxim of “first, do no harm” will be insufficient in addressing this problem. We absolutely must do better, and we must do it soon.

How Do we Fix It?

It’s important for advocates of successful technology adoption in government to understand why governments face so many challenges in acquiring and implementing technology. The processes that support paying for and acquiring technology, and the process for hiring IT workers are designed in a way that make them a bad match for the dynamics of the technology industry.

Knowing why governments face challenges in implementing new technology is important if we’re going to find ways to address the problem. When we examine the range of different approaches being advocated to help governments more successfully adopt technology, we should evaluate their merits based on the degree to which they address one or more of the processes detailed above.

For example, efforts to bring more highly skilled technology employees into government through groups like 18F in the US, or the Government Digital Service in the UK are meant to address the challenges faced in recruiting IT staff through traditional civic service channels. Projects like Screendoor are meant to address challenges that typically arise from traditional government procurement processes.

There are lots of other examples of efforts underway to help governments get better at implementing and using technology – too many to discuss fully here. But the ultimate success of each will be tied – I believe – to the degree to which they help address the challenges engineered into one of the three processes discussed above – procurement, budgeting and employee recruitment / retention.

Governments are not bad at adopting new technologies on accident. The processes that support the adoption of new technology were built to fail. Understanding this is the first step to fixing them.

In Defense of Transit Apps

The civic technology community has a love-hate relationship with transit apps.

We love to, and often do, use the example of open transit data and the cottage industry of civic app development it has helped spawn as justification for governments releasing open data. Some of the earliest, most enduring and most successful civic applications have been built on transit data and there literally hundreds of different apps available.

The General Transit Feed Specification (GTFS), which has helped to encourage the release of transit data from dozens and dozens of transportation authorities across the country, is used as the model for the development of other open data standards. I once described work being done to develop a data standard for locations dispensing vaccinations as “GTFS for flu shots.”


But some in the civic technology community chafe at the overuse of transit apps as the example cited for the release of open data and engagement with outside civic hackers. Surely there are other examples we can point to that get at deeper, more fundamental problems with civic engagement and the operation of government. Is the best articulation of the benefits of open data and civic hacking a simple bus stop application?

Last week at Transparency Camp in DC, during a session I ran on open data, I was asked what data governments should focus on releasing as open data. I stated my belief that – at a minimum – governments should concentrate on The 3 B’s: Buses (transit data), Bullets (crime data) and Bucks (budget & expenditure data).

To be clear – transit data and the apps it helps generate are critical to the open data and civic technology movements. I think it is vital to exploring the role that transit apps have played in the development of the civic technology ecosystem and their impact on open data.

Story telling with transit data

Transit data supports more than just “next bus” apps. In fact, characterizing all transit apps this way does a disservice to the talented and creative people working to build things with transit data. Transit data supports a wide range of different visualizations that can tell an intimate, granular story about how a transit system works and how it’s operation impacts a city.

One inspiring example of this kind of app was developed recently by Mike Barry and Brian Card, and looked at the operation of MBTA in Boston. Their motive was simple:

We attempt to present this information to help people in Boston better understand the trains, how people use the trains, and how the people and trains interact with each other.

We’re able to tell nuanced stories about transit systems because the quality of data being released continues to expand and improve in quality. This happens because developers building apps in cities across the country have provided feedback to transit officials on what they want to see and the quality of what is provided.

Developers building the powerful visualizations we see today are standing on the shoulders of the people that built the “next bus” apps a few years ago. Without these humble apps, we don’t get to tell these powerful stories today.

Holding government accountable

Transit apps are about more than just getting to the train on time.

Support for transit system operations can run into the billions of dollars and affect the lives of millions of people in an urban area. With this much investment, it’s important that transit riders and taxpayers are able to hold officials accountable for the efficient operation of transit systems. To help us do this, we now have a new generation of transit apps that can examine things like the scheduled arrival and departure times of trains with their actual arrival and departure time.

Not only does this give citizens transparency into how well their transit system is being run, it offers a pathway for engagement – by knowing which routes are not performing close to scheduled times, transit riders and others can offer suggestions for changes and improvements.

A gateway to more open data

One of the most important things that transit apps can do is provide a pathway for more open data.

In Philadelphia, the city’s formal open data policy and the creation of an open data portal all followed after the efforts of a small group of developers working to obtain transit schedule data from the Southeastern Pennsylvania Transportation Authority (SEPTA). This group eventually built the region’s first transit app.

This small group pushed SEPTA to make their data open, and the Authority eventually embraced open data. This, in turn, raised the profile of open data with other city leaders and directly contributed to the adoption of an open data policy by the City of Philadelphia several years later. Without this simple transit app and the push for more open transit data, I don’t think this would have happened. Certainly not as soon as it did.

And it isn’t just big cities like Philadelphia. In Syracuse, NY – a small city with no tradition of civic hacking and no formal open data program – a group at a local hackathon decided that they wanted to build a platform for government open data.

The first data source they selected to focus on? Transit data. The first app they built? A transit app.

In the next year or so, as the open data movement in Syracuse picks up steam, people will be able to look back at this development as the catalyst for it all. Transit apps and transit data are a gateway to adoption of a more comprehensive open data policy by governments.

In many ways, those of us working in the civic technology and open data fields are standing on the shoulders of the people who built all those “next bus” apps a few years back. Their efforts, and the value of transit apps, deserve our recognition.