Virtuous Roadblocks and Digital Service Delivery

If Men were angels, no government would be necessary. If angels were to govern men, neither external nor internal controls on government would be necessary. In framing a government which is to be administered by men over men, the great difficulty lies in this: you must first enable the government to control the governed; and the next place, oblige it to control itself.

— James Madison

Hey there!

Are you thinking of joining the federal government to build better digital services for those that need them? Are you new to government service?

Awesome! This post is for you.

Read More

Amplifying Administrative Burden

How Poor Technology Choices Can Magnify the Challenges Faced by Those Seeing Government Services

For every 10 people who said they successfully filed for unemployment benefits during the previous four weeks three to four additional people tried to apply but could not get through the system to make a claim. Two additional people did not try to apply because it was too difficult to do so. When we extrapolate our survey findings to the full five weeks of UI claims since March 15, we estimate that an additional 8.9–13.9 million people could have filed for benefits had the process been easier. [Emphasis added]

Unemployment filing failures: New survey confirms that millions of jobless were unable to file an unemployment insurance claim. Economic Policy Institute

The impact on jobs and our economy from the ongoing COVID-19 pandemic, and the attempts by our government to provide relief for those impacted through the CARES Act has brought into sharp focus the issue of administrative burden. Administrative burden can be succinctly defined as “an individual’s experience of a policy’s implementation as onerous.”

Read More

Process Eats Culture for Breakfast

Famed management consultant Peter Drucker is often credited with the phrase “culture eats process (or strategy) for breakfast.”

You can’t change organizations by implementing new processes alone, so the thinking goes, you have to foster a new culture in order to drive real change. To understand the degree to which this idea is accepted as management philosophy gospel, we have but to count the number of times it is repeated at conferences, in meetings, or on social media by various thought leaders.

But when we think about changing the way public sector organizations work, particularly in how they acquire and manage new technology, this idea gets flipped. In the world of government technology, process eats culture for breakfast.


Read More

Towards Ethical Algorithms

Old tools & new challenges for governments

There is a common misconception that data-driven decision making and the use of complex algorithms are a relatively recent phenomenon in the public sector. In fact, making use of (relatively) large data sets and complex algorithms has been fairly common in government for at least the past few decades.

As we begin constructing ethical frameworks for how data and algorithms are used, it is important that we understand how governments have traditionally employed these tools. By doing so, we can more fully understand the challenges governments face when using larger data sets and more sophisticated algorithms and design ethical and governance frameworks accordingly.

Read More

Building the Government Data Toolkit


Flickr image courtesy of Flickr user bitterbuick

We live in a time when people outside of government have better tools to build things with and extract insights from government data than governments themselves.

These tools are more plentiful, more powerful, more flexible, and less expensive than pretty much everything government employees currently have at their disposal. Governments may have exiting relationships with huge tech companies like Microsoft, IBM, Esri and others that have an array of different data tools — it doesn’t really matter.

In the race for better data tools, the general public isn‘t just beating out the public sector, its already won the race and is taking a Jenner-esque victory lap.

This isn’t a new trend.

Read More

GovTech is Not Broken

When we talk about the challenges that face governments in acquiring and implementing new technology, the conversation eventually winds around to the procurement process.

That’s when things usually get ugly. “It’s broken,” they say. “It just doesn’t work.”

What most people who care about this issue fail to recognize, however, is that while the procurement process for technology may not work well for governments or prospective vendors (particularly smaller, younger companies), it is not broken.

It works exactly as it was designed to work.

Read More

Command Line Data Science

When it comes to deriving useful results about the operation of government from open data sets, we have an enormous array of tools at our disposal that we can make use of. Often, we do not need sophisticated or expensive tools to produce useful results.

In this post, I want to use command line tools that are available on most laptops, and others that can be downloaded for free, to derive meaningful insights from a real government open data set. The following examples will leverage *nix-based tools like tail, grep, sort, uniq and sed as well as open source tools that can be invoked from the command line like csvkit and MySQL.

The data used in this post is from the NY State Open Data Portal for traffic tickets issued in New York State from 2008 – 2012.

Read More

Better Licensing For Open Data

It’s really interesting to see so many governments start to use GitHub as a platform for sharing both code and data. One of the things I find interesting, though, is how infrequently governments use standard licenses with their data and app releases on GitHub.

Why no licenses?

I’m as guilty as anyone of pushing government data and apps to GitHub without proper terms of use, or a standard license. Adding these to a repo can be a pain – more often than not, I used to find my self rooting around in older repos looking for a set of terms that I could include in a repo I wanted to create and copying it. This isn’t a terrible way ensure that terms of use for government data and apps stay consistent, but I think we can do better.

Before leaving the City of Philadelphia, I began experimenting with a new approach. I created a stand-alone repository for our most commonly used set of terms & conditions. Then, I added the license to a new project as a submodule. With this approach, we can ensure that every time a set of terms & conditions is included with a repo containing city data or apps that the language is up to date and consistent with what is being used in other repos.

Adding the terms of use to a new repo before making it public is easy:

~$ git submodule add git:// license

This adds a new subdirectory in the parent repo named ‘license’ that contains a reference to the repo holding the license language. Any user cloning the repo to use the data or app, simply does (for purposes of demonstration, using this rep):

~$ git clone
~$ git submodule init
~$ git submodule update

The user can run git submodule update any time to get the very latest license language, which can change from time to time.

Github is an amazing platform for governments to use in sharing open data and fostering collaboration through releasing applications as open source projects.

I think it also provides some powerful facilities for associating licenses and terms & conditions with these releases – something every open source project needs to be sustainable and successful.

Some Tips on API Stewardship

Following up on my last post, and a recent trip to St. Paul Minnesota for the NAGW Annual Conference to talk about open data APIs, I wanted to provide a few insights for proper API stewardship for any government looking to get started with open data, or those that already have an open data program underway.

Implementing an API for your open data is not a trivial undertaking, and even if this is a function that you outsource to a vendor or partner it’s useful to understand some of the issues and challenges involved.

This is something that the open data team in the City of Philadelphia researched extensively during my time there, and this issue continues to be among the most important for any government embarking on an open data program.

In no particular order, here are some of the things that I think are important for proper API stewardship.

Implement Rate Limiting

APIs are shared resources, and one consumer’s use of an API can potentially impact anther consumer. Implementing rate limiting ensures that one consumer doesn’t crowd out others by trying to obtain large amounts of data through your API (that’s what bulk downloads are for).

If you want to start playing around with rate limiting for your API, have a look at Nginx – an open source web proxy that makes it super easy to implement rate limits on your API. I use Nginx as a reverse proxy for pretty much every public facing API I work on. It’s got a ton of great features that make it ideal for front ending your APIs.

Depending on the user base for your API, you may also want to consider using pricing as a mechanism for managing access to your API.

Provide Bulk Data

If the kind of data you are serving through your API is also the kind that consumers are going to want to get in bulk, you should make it available as a static – but regularly updated – download (in addition to making it available through your API).

In my experience, APIs are a lousy way to get bulk data – consumers would much rather get it as a compressed file they can download and use without fuss, and making consumers get bulk data through your API simply burdens it with unneeded traffic and ties up resources that can affect other consumers’ experience using your API.

If your serving up open data through your API, here are some additional reasons that you should also make this data available in bulk.

Use a Proxy Cache

A proxy cache sits in between your API and those using it, and caches responses that are frequently requested. Depending on the nature of the data you are serving through your API, it might be desirable to cache responses for some period of time – even up to 24 hours.

For example, an API serving property data might only be updated when property values are adjusted – either through a reassessment or an appeal by a homeowner. An API serving tax data might only be updated on a weekly basis. The caching strategy you employ with your open data API should be a good fit for the frequency with which the data behind it is updated.

If the data is only updated on a weekly basis, there is little sense in serving every single request to your API through a fresh call down the stack to the application and database running it. It’s more beneficial for the API owner, and the API consumer, if these requests are served out of cache.

There are lots of good choices for standing up a proxy cache like Varnish or Squid. These tools are open source, easy to use and can make a huge difference in the performance of your API.

Always Send Caching Instructions to API Consumers

If your API supports CORS or JSONP then it will serve data directly to web browsers. An extension of the cacheing strategy discussed above should address cache headers that are returned to browser-based apps that will consume data from your API.

There are lots of good resources providing details of how to effectively employ cache headers like this and this. Use them.

Evaluate tradeoffs of using ETags

ETags are related to the cacheing discussion detailed above. In a nutshell, ETags enable your API consumers to make “conditional” requests for data.

When ETags are in use, API responses are returned to consumers with a unique representation of a resource (an ETag). When the resource changes – i.e., is updated – the ETag for that resource will change. A client can make subsequent requests for the same resource and include the original ETag in a special HTTP header. If the resource has changed since the last request, the API will return the updated resource (with an HTTP 200 response, and the new ETag). This ensures that the API consumer always gets the latest version of a resource.

If the resource hasn’t changed since the last request, the API will instead return a response indicating that the resource was not modified (an HTTP 304 response). When the API sends back this response to the consumer, the content of the resource is not included, meaning the transaction is less “expensive” because what is actually sent back as a response from the API is smaller in size. This does not, however, meant that your API doesn’t expend resources when ETgas are used.

Generating ETags and checking them against those sent with each API call will consume resources and can be rather expensive depending on how your API implements ETags. Even if what gets sent over the wire is more compact, the client response will be slowed down by the need to match ETags submitted with API calls, and this response will probably always be slower than sending a response from a proxy cache or simply dipping into local cache (in instances where a browser is making the API call).

Also, if you are rate limiting your API does responses that generate an HTTP 304 count against an individual API consumer’s limit? Some APIs work this way.

Some examples of how ETags work using CouchDB – which has a pretty easy to understand ETags implementation – can be found here.


Did I miss something? Feel free to add a comment about what you think is important in API stewardship below.