Amplifying Administrative Burden

How Poor Technology Choices Can Magnify the Challenges Faced by Those Seeing Government Services

For every 10 people who said they successfully filed for unemployment benefits during the previous four weeks three to four additional people tried to apply but could not get through the system to make a claim. Two additional people did not try to apply because it was too difficult to do so. When we extrapolate our survey findings to the full five weeks of UI claims since March 15, we estimate that an additional 8.9–13.9 million people could have filed for benefits had the process been easier. [Emphasis added]

Unemployment filing failures: New survey confirms that millions of jobless were unable to file an unemployment insurance claim. Economic Policy Institute

The impact on jobs and our economy from the ongoing COVID-19 pandemic, and the attempts by our government to provide relief for those impacted through the CARES Act has brought into sharp focus the issue of administrative burden. Administrative burden can be succinctly defined as “an individual’s experience of a policy’s implementation as onerous.”

Read More

Process Eats Culture for Breakfast

Famed management consultant Peter Drucker is often credited with the phrase “culture eats process (or strategy) for breakfast.”

You can’t change organizations by implementing new processes alone, so the thinking goes, you have to foster a new culture in order to drive real change. To understand the degree to which this idea is accepted as management philosophy gospel, we have but to count the number of times it is repeated at conferences, in meetings, or on social media by various thought leaders.

But when we think about changing the way public sector organizations work, particularly in how they acquire and manage new technology, this idea gets flipped. In the world of government technology, process eats culture for breakfast.


Read More

Towards Ethical Algorithms

Old tools & new challenges for governments

There is a common misconception that data-driven decision making and the use of complex algorithms are a relatively recent phenomenon in the public sector. In fact, making use of (relatively) large data sets and complex algorithms has been fairly common in government for at least the past few decades.

As we begin constructing ethical frameworks for how data and algorithms are used, it is important that we understand how governments have traditionally employed these tools. By doing so, we can more fully understand the challenges governments face when using larger data sets and more sophisticated algorithms and design ethical and governance frameworks accordingly.

Read More

Building the Government Data Toolkit


Flickr image courtesy of Flickr user bitterbuick

We live in a time when people outside of government have better tools to build things with and extract insights from government data than governments themselves.

These tools are more plentiful, more powerful, more flexible, and less expensive than pretty much everything government employees currently have at their disposal. Governments may have exiting relationships with huge tech companies like Microsoft, IBM, Esri and others that have an array of different data tools — it doesn’t really matter.

In the race for better data tools, the general public isn‘t just beating out the public sector, its already won the race and is taking a Jenner-esque victory lap.

This isn’t a new trend.

Read More

GovTech is Not Broken

When we talk about the challenges that face governments in acquiring and implementing new technology, the conversation eventually winds around to the procurement process.

That’s when things usually get ugly. “It’s broken,” they say. “It just doesn’t work.”

What most people who care about this issue fail to recognize, however, is that while the procurement process for technology may not work well for governments or prospective vendors (particularly smaller, younger companies), it is not broken.

It works exactly as it was designed to work.

Read More

Command Line Data Science

When it comes to deriving useful results about the operation of government from open data sets, we have an enormous array of tools at our disposal that we can make use of. Often, we do not need sophisticated or expensive tools to produce useful results.

In this post, I want to use command line tools that are available on most laptops, and others that can be downloaded for free, to derive meaningful insights from a real government open data set. The following examples will leverage *nix-based tools like tail, grep, sort, uniq and sed as well as open source tools that can be invoked from the command line like csvkit and MySQL.

The data used in this post is from the NY State Open Data Portal for traffic tickets issued in New York State from 2008 – 2012.

Read More

Better Licensing For Open Data

It’s really interesting to see so many governments start to use GitHub as a platform for sharing both code and data. One of the things I find interesting, though, is how infrequently governments use standard licenses with their data and app releases on GitHub.

Why no licenses?

I’m as guilty as anyone of pushing government data and apps to GitHub without proper terms of use, or a standard license. Adding these to a repo can be a pain – more often than not, I used to find my self rooting around in older repos looking for a set of terms that I could include in a repo I wanted to create and copying it. This isn’t a terrible way ensure that terms of use for government data and apps stay consistent, but I think we can do better.

Before leaving the City of Philadelphia, I began experimenting with a new approach. I created a stand-alone repository for our most commonly used set of terms & conditions. Then, I added the license to a new project as a submodule. With this approach, we can ensure that every time a set of terms & conditions is included with a repo containing city data or apps that the language is up to date and consistent with what is being used in other repos.

Adding the terms of use to a new repo before making it public is easy:

~$ git submodule add git:// license

This adds a new subdirectory in the parent repo named ‘license’ that contains a reference to the repo holding the license language. Any user cloning the repo to use the data or app, simply does (for purposes of demonstration, using this rep):

~$ git clone
~$ git submodule init
~$ git submodule update

The user can run git submodule update any time to get the very latest license language, which can change from time to time.

Github is an amazing platform for governments to use in sharing open data and fostering collaboration through releasing applications as open source projects.

I think it also provides some powerful facilities for associating licenses and terms & conditions with these releases – something every open source project needs to be sustainable and successful.

Some Tips on API Stewardship

Following up on my last post, and a recent trip to St. Paul Minnesota for the NAGW Annual Conference to talk about open data APIs, I wanted to provide a few insights for proper API stewardship for any government looking to get started with open data, or those that already have an open data program underway.

Implementing an API for your open data is not a trivial undertaking, and even if this is a function that you outsource to a vendor or partner it’s useful to understand some of the issues and challenges involved.

This is something that the open data team in the City of Philadelphia researched extensively during my time there, and this issue continues to be among the most important for any government embarking on an open data program.

In no particular order, here are some of the things that I think are important for proper API stewardship.

Implement Rate Limiting

APIs are shared resources, and one consumer’s use of an API can potentially impact anther consumer. Implementing rate limiting ensures that one consumer doesn’t crowd out others by trying to obtain large amounts of data through your API (that’s what bulk downloads are for).

If you want to start playing around with rate limiting for your API, have a look at Nginx – an open source web proxy that makes it super easy to implement rate limits on your API. I use Nginx as a reverse proxy for pretty much every public facing API I work on. It’s got a ton of great features that make it ideal for front ending your APIs.

Depending on the user base for your API, you may also want to consider using pricing as a mechanism for managing access to your API.

Provide Bulk Data

If the kind of data you are serving through your API is also the kind that consumers are going to want to get in bulk, you should make it available as a static – but regularly updated – download (in addition to making it available through your API).

In my experience, APIs are a lousy way to get bulk data – consumers would much rather get it as a compressed file they can download and use without fuss, and making consumers get bulk data through your API simply burdens it with unneeded traffic and ties up resources that can affect other consumers’ experience using your API.

If your serving up open data through your API, here are some additional reasons that you should also make this data available in bulk.

Use a Proxy Cache

A proxy cache sits in between your API and those using it, and caches responses that are frequently requested. Depending on the nature of the data you are serving through your API, it might be desirable to cache responses for some period of time – even up to 24 hours.

For example, an API serving property data might only be updated when property values are adjusted – either through a reassessment or an appeal by a homeowner. An API serving tax data might only be updated on a weekly basis. The caching strategy you employ with your open data API should be a good fit for the frequency with which the data behind it is updated.

If the data is only updated on a weekly basis, there is little sense in serving every single request to your API through a fresh call down the stack to the application and database running it. It’s more beneficial for the API owner, and the API consumer, if these requests are served out of cache.

There are lots of good choices for standing up a proxy cache like Varnish or Squid. These tools are open source, easy to use and can make a huge difference in the performance of your API.

Always Send Caching Instructions to API Consumers

If your API supports CORS or JSONP then it will serve data directly to web browsers. An extension of the cacheing strategy discussed above should address cache headers that are returned to browser-based apps that will consume data from your API.

There are lots of good resources providing details of how to effectively employ cache headers like this and this. Use them.

Evaluate tradeoffs of using ETags

ETags are related to the cacheing discussion detailed above. In a nutshell, ETags enable your API consumers to make “conditional” requests for data.

When ETags are in use, API responses are returned to consumers with a unique representation of a resource (an ETag). When the resource changes – i.e., is updated – the ETag for that resource will change. A client can make subsequent requests for the same resource and include the original ETag in a special HTTP header. If the resource has changed since the last request, the API will return the updated resource (with an HTTP 200 response, and the new ETag). This ensures that the API consumer always gets the latest version of a resource.

If the resource hasn’t changed since the last request, the API will instead return a response indicating that the resource was not modified (an HTTP 304 response). When the API sends back this response to the consumer, the content of the resource is not included, meaning the transaction is less “expensive” because what is actually sent back as a response from the API is smaller in size. This does not, however, meant that your API doesn’t expend resources when ETgas are used.

Generating ETags and checking them against those sent with each API call will consume resources and can be rather expensive depending on how your API implements ETags. Even if what gets sent over the wire is more compact, the client response will be slowed down by the need to match ETags submitted with API calls, and this response will probably always be slower than sending a response from a proxy cache or simply dipping into local cache (in instances where a browser is making the API call).

Also, if you are rate limiting your API does responses that generate an HTTP 304 count against an individual API consumer’s limit? Some APIs work this way.

Some examples of how ETags work using CouchDB – which has a pretty easy to understand ETags implementation – can be found here.


Did I miss something? Feel free to add a comment about what you think is important in API stewardship below.

Open Data: Beyond the Portal

One of the most visible statements a government embarking on a new open data program can make is the selection of an “open data portal.”

An open data portal provides a central location for listing or storing data released by a government for use by outside consumers, making such data more easily discoverable. A portal also has value as a more concrete manifestation of a government’s intentions for open government.

Governments that have data portals as the centerpiece of their open government agendas make a public statement about the importance of data to being transparent and collaborative.

But open data portals are much more than just data directories or repositories – when implemented and managed successfully, they are also the centerpiece for the community that generates value from publicly released government data.

The community around an open data portal is a direct contributor to the success of an open data program – and this community includes both people inside government (data producers) and outside (data consumers – developers, journalists, researchers, civic activists, etc.)

This fact helps underscore some important considerations government officials should keep in mind when evaluating different options for an open data portal, and also highlights work that must be done beyond the selection of an open data portal to ensure the success of government transparency efforts.

The Community Inside

The process that is used to identify, review, release, update and maintain information in an open data portal – regardless of what kind of portal it is – is what turns the wheels of open government.

The internal community around an open data portal is made up of data stewards and producers inside government.

This community uses an open data portal in a very specific way. A subset of this community may be involved in the maintenance or management of the underlying software platform that supports the open data portal, but most will contribute data (or information about data) to the portal in some way.

But before this specific touch point, where internal community members contribute data to a portal, there is a series of decisions and actions that must be taken to decide which data gets put into a portal, and what format that data will take.

All governments operate under an explicit set of rules about the kinds of data that can and should be released for public consumption. But beyond this binary evaluation of public vs. non-public, there is a set of (often complex) factors that need to be considered:

  • Which data sets have a higher “value” relative to others? What should be focused on first?
  • What is the current state of the data – is it accurate and up to date?
  • Does it require meta information, to assist users in understanding what it is and how it may be most effectively used?
  • Where is the data currently housed? Are there any technical barriers that might make it difficult to stage it for public release?
  • What specific steps are needed to take data from a backend system or data store and stage it for public release?
  • Who is responsible for each step? One person? Many?
  • What is the appropriate refresh cycle for such data? Does it change often enough to warrant frequent updates?
  • What is the appropriate format to release a data set in? Should more than one format be used?

(Another good source of information for data producers to take into consideration are the 8 Principles of Open Data.)

The process by which governments work through these issues (and others) is the foundation on which a successful open data program operates. The process that is used to identify, review, release, update and maintain information in an open data portal – regardless of what kind of portal it is – is what turns the wheels of open government.

The work to develop this process (or set of processes) must be done regardless of which open data portal a government elects to use.

This is not meant to suggest that the picking the right data portal doesn’t have value, just that much work remains to be done to build a successfully open data portal beyond simply picking which one to use.

The Community Outside

[S]ometimes, selecting the right data portal can make building an external community around open data easier.

Governments must also work to build the external community of users around an open data portal – this external community will use an open data portal very differently than their internal counterparts. These users will be direct consumers of the data provided by governments, and may also provide ideas for new data to release and feedback on the quality of existing open data.

To properly serve the external community of users of open data, governments must ensure that the portal they select (or build) has the features required to interact with this community.

Providing a forum for discussion, feedback mechanisms, the ability to rate the quality of data and suggest new kinds of data are all important functions. There are a number of both commercial and open source data portal options that do each of these things quite well.

Selecting an open source alternative for an open data portal might be perceived as a daunting task for some governments. There are several well developed and (increasingly) widely used open source options, including (but not limited to):

One of the primary considerations for a government considering an open source option for their data portal is the technology stack used to build it. Often, a mismatch between the technology used in one of these open source options and the government’s own technology infrastructure may raise concerns.

There are, however, some great examples of open source data portals that have been implemented with the assistance and direct involvement of members of the external data community, many of whom are software developers. The data portal is a god example of this, as are it’s sister sites in San Diego and Chattanooga, TN.

Leveraging a local community of technologists and developers to help stand up, manage and improve a government’s data portal by using open source software may be an effective way of engaging and building the external community of data consumers.

In this way, an open source data portal may have an advantage over a commercial offering – the external community of users is directly invested in the data portal itself, and have a way to contribute to it themselves and make it better.

Building the internal and external communities around an open data portal is important work that must be done to ensure the success of a government’s open data and transparency program.

Selecting a specific open data portal to use doesn’t de-obligate governments from this important, foundational work.

And sometimes, selecting the right data portal can make building an external community around open data easier.