It strikes me that this is a very relevant issue for those in the open data movement, as the data generated by urban sensor networks is likely to be mashed up with publicly available data from cities on crime, land use, service requests and a host of other things to drive better decision making. There’s a natural connection between the kinds of data we find in open data portals and the kind of data that is generated by emerging sensor networks.
It also strikes me that most municipal open data portals are not well suited to provide access to realtime data – the kinds of data that sensor networks are really good at generating.
Current State of Open Data Portals
Pretty much every modern open data portal provides a way to programmatically access data that is housed in it – data is accessed via an API by making an HTTP request (with the required information – e.g., authentication – in the request) and getting a response back (typically in either JSON or XML format). This data access paradigm fits well with the way that most of the data in municipal open data portals is updated – usually not more frequently than daily.
If data updates happen frequently – or if a data consumer wants to check and see if data has changed since the last time it was accessed – a consumer application can poll the API for changes at set intervals. And though this approach works acceptably well for data that doesn’t change all that often, it is far from acceptable from data that does (or could) change more frequently. In fact, the closer updates to data get to realtime changes, the less optimal this approach is because it places a heavier burden on consumers (who must poll the API for data chances more frequently) and for the data portal itself (which must handle and respond to more frequent requests from API consumers).
Other – more efficient – approaches to accessing data can be used when data updates occur more frequently. These approaches – like server-sent events and Websockets (which are both part of the HTML5 specification), or registering a callback URL (or Webhook) – benefit both the data consumer and the data producer.
Getting to Realtime
The closest thing I can identify to a realtime open data API is one that we built in the City of Philadelphia for flight information from the Philadelphia International Airport. This API uses data from the airport flight information system and is updated every three minutes (about the same frequency as data is updated on the Airport’s website and on flight information displays in the airport terminals). It provides a simple REST API for making standard HTTP calls for data on specific flights, and was also designed with a Websocket endpoint to allow realtime connections.
Another interesting realtime data project from Chicago is ClearStreets (a project of Open City, which has built a number of powerful civic apps for the City of Chicago) that shows the realtime position of plows as they clear the streets after heavy snow.
Even more exciting is the OpenSensors project which is a platform that supports data aggregation from remote sensor networks – the project hosts open data projects at no cost and allows anyone to subscribe to data feeds from these open sensor network projects.
I think these examples show how municipal open data portals can more in the direction of supporting realtime data, and – perhaps more importantly – how governments can begin to understand the coming importance of providing ways for data consumers to use realtime methods for accessing data.
Practical First Steps
It can be tempting to think of the need for realtime data as being closely coupled with the use of sensors. But even in places where sensor networks are not yet built out (or even planned), there are lots of opportunities for open data to become closer to realtime.
Crime incidents, parking citations, 311 service requests, road closures, permit and license issuance – these are all activities that occur every hour of every day as a part of municipal operations. And yet the data that is generated by these activities is still largely consumed through open data portals in a fashion that best fits data which is updated only periodically.
Wouldn’t it be useful if data consumers could subscribe to a specific topic or channel (like Service Requests or Building Permits) for a specific neighborhood, register a callback URL and then receive a push of JSON representing the specific event when it occurred? No more wasteful polling for changes that consume resources on both the client and data portal side – just send me information on an event I care about when it occurs.
In some instances, the barriers to moving toward making more realtime data available from governments is related to technology – some legacy systems may not make it practical to expose data in this way. But as cities start producing more and more data – particularly as remote sensor networks become more common – the demand for ways to consume data in more appropriate ways will increase.
Will municipal open data portals be able to keep up with this demand? We’ll see.
“Civic Hacking” is the awareness of a condition that is suboptimal in a neighborhood, community or place and the perception of one’s own ability to effect change on that condition. The apps are incidental.
In 2008, civic hacking was the furthest thing from my mind.
At the time, I was working for a small company in Southwestern Virginia that built payment and telephony systems for local governments. I had left state government behind 5 years prior – after working for almost a dozen years in two different states in both the legislative and executive branches – to become a full time technologist. My job enabled me to expand my knowledge of software development, VoIP and telephony systems design and made me feel “connected” to government.
I felt like I was still working to help governments use technology more efficiently – which was the focus of the last several years of my public service – because my company was building tools that were used by governments. In reality, I was probably more unsatisfied with my current “connection” with government than I had realized.
In the Fall of that year I came across an announcement about a contest that was taking place in the District of Columbia that seemed quite extraordinary to me. The DC government had published dozens of data sets to a public website in highly usable formats and was inviting outside software developers to do interesting things with this data. Winners would be chosen and given cash rewards, along with a chance to be singled out by the Mayor at a public ceremony.
I instantly knew that I wanted to participate – even though I neither lived nor worked in DC. I entered the contest, submitted my application, won the silver medal in the “independent developer” category and got a $1,000 check for my efforts.
For this, and many other reasons, I have been bullish on civic hacking ever since.
From App Contests to Civic Hacking
“Ultimately, apps contests are having a positive long-term economic impact, regardless of whether they deliver useful technology. They have catalyzed a community of technologists inside and outside of government who are committed to improving the lives of residents and visitors.”
After the initial wave of government app contest spurred by the Apps for Democracy contest in DC, the world of civic hacking went grassroots, with community sponsored events popping up all over the country. The last several years have seen the creation and spectacular growth of the Code for America Brigade, which has helped to create civic hacking groups in dozens of cities in the U.S. and other countries.
Today, a great deal of civic hacking occurs outside of app contests, or even hackathons themselves. It is a regular activity that occurs each week or month in Code for America Brigades and other groups. A great example that I like to point to is the Detroit Water Project, which came together when the co-creators connected via Twitter. The project didn’t require a hackathon or similar event, or even a physical meeting between the creators to get started.
App contests and the early wave of organized civic hacking events has helped spur the development of a large (and growing) community that can now come together and interact more fluidly. The solutions being developed by these groups are increasingly potent and I think are appropriately viewed as part of the answer to the problems governments face in using technology to do their jobs.
The Three-Legged Stool
Of the many different policy options put forward in the last few years aimed at improving the way governments implement technology, I think there are three primary themes we can identify:
Deploy APIs and release open data to create a platform on which third parties may develop new applications and services (“government as a platform”).
To solve the overarching problem, I think each of these three approaches are needed to some degree – a balance between them must be struck so that they act like the metaphorical “three-legged stool.” Of these three remedies, the one that requires the most radical perceived departure from the way that governments currently operate is the third – turning government into a platform.
Certainly the scope to which the current procurement process must be changed is vast, and governments have a long way to go to replicate the capacity for successful IT project management that we see in private sector organizations. But progress on these fronts involves changes – as dramatic as they may need to be – to existing processes, not the invention of brand new ones.
The idea of creating government as a platform, and enabling agencies to work collaboratively with outside parties (outside of traditional contract vehicles) to develop applications for their constituencies can seem like the most radical change. It requires governments to abdicate some control to new partners, to develop mechanisms for engaging and collaborating with these partners and to reimagine their role in the IT service delivery chain – to no longer be the unilateral creator of solutions used by and for government, and to become an enabler that incentivizes others to build them.
Despite the perceived novelty of this approach in the world of technology, there is actually a long history of reliance on outside volunteers to deliver important government services – one that continues today. In fact, there is a rich spectrum of examples that we can observe in the contemporary operation of government that involves government reliance on outside volunteers to deliver essential public services.
I believe that these examples hold the key to informing how governments should collaborate with outside civic hackers to develop new solutions that can improve the performance of government and the quality of the services they deliver.
Dissatisfaction as a Foundation For Action
The video above was captured in September 2013 in the UK, and shows a group of people heading home after a night out on the town. The surprising thing about this footage is that it didn’t capture people behaving badly – in fact, it shows the group working collaboratively to fix a damaged bike rack.
Distilled to its essence – this is what civic hacking is. It is the awareness of a condition that is suboptimal in a neighborhood, community or place and the perception of one’s own ability to effect change on that condition. There is no prerequisite that civic hacking involve technology or software, it only needs to involve people willing to help fix problems – apps are incidental to the larger goal of fixing a community problem.
In a way, civic hacking is the a manifestation of dissatisfaction with government services. And while there has probably always been some level of dissatisfaction with the performance of government, the spread of open data and powerful, cheap tools for using this data to build new apps allows citizens to design their own interfaces to interactions with their government.
There is an abundance of examples we can point to where outside parties develop solutions on top of government provided or maintained data – to fill a role or address an issue that would ordinarily fall under the official responsibilities of a government agency. In Philadelphia, there are a number of efforts underway to encourage the repurposing of vacant properties, even though official responsibility for this falls under the duties assigned to specific government agencies. These outside efforts are enabled by the deliberate release of property information by the City of Philadelphia.
Before civic hacking, it was not possible for people to custom tailor an interaction with their government to their liking, or to change that way that government information and services were presented. Now it can be quite easy to do this. This presents an enormous challenge to the bureaucracy, and – in many ways – an enormous opportunity.
An Abundance of Skeptics
Despite the popularity of hackathons, and the strong growth of civic hacking across the country, it’s not difficult to find people that criticize civic hacking, or question its long-term impact.
In my experience most people that are skeptical of the potential impact of civic hacking have either been to very few (if any) actual civic hacking events, or conflate government sponsored app contests – that were quite common several years ago – with the larger civic hacking movement. Some even question the motives of those that promote civic hacking and suggest that it may be nothing more than a sham meant to take advantage of skilled but inexperienced workers in an unfavorable job market.
“As an enactment of civic intent, hackathons parochialize the ambition of democratic participation to topics that attract the data and technical means for impact in the course of a day or a weekend.
Even organizations focused on fostering innovation in cities can be critical of civic hacking. A 2012 “field scan” of civic technology for the group Living Cities said:
Energetic, enthusiastic volunteering in ‘hackathons’ and other partnerships are not enough to create sustainable change in cities. Although hackathons are popular, their approach to problem solving is not always driven by community needs, and hackathons often do not produce useful material for governments or citizens in need.
I think both of these criticisms fail to see civic hacking as a larger movement that exists outside specific events that happen on a weekend here and there, and both miss the very important point that the apps created at any specific event are often not the primary focus of the hackathon.
In his excellent summary for running a civic hackathon, Joshua Tauberer says:
Think of the hackathon as a pit-stop on a long journey to solve problems or as a training session to prepare participants for solving problems.
The civic technology community has become increasingly aware of the need to ensure that solutions are developed in collaboration with those that are meant to benefit from them. Civic hacking groups are developing new ways to include the users of civic application into the development process, to better ensure that their preferences and viewpoints are considered. In many ways, I think its fair to say that the amount of time and energy currently focused on ensuring user input in the development of civic apps probably outpaces the amount invested in the development of official government apps.
But most of all, what strikes me as relevant in these criticisms of civic hacking is the derision – whether explicit or implied – of the volunteer nature of it. I disagree that the volunteer nature of civic hacking means that it does not have value, or that it can not have a long-term impact. In fact, there are a number of examples that we can point to where governments partner with volunteers to provide important public services.
A Long History of Volunteerism
Government collaboration with volunteers may be new to the world of technology, but in other areas of public service delivery it is quite common.
The vast majority of firefighters serving the Untied States are volunteers – 69% according the National Fire Protection Association. So cemented in our national psyche as a symbol of selfless public service are volunteer firefighters that it was used as the template for the Code for America Brigades that are now growing in cities across the country. But there are many other examples where the government collaborates with volunteers to provide important services.
The AmeriCorps service program – created under the Clinton Administration – is a federal program to recruit young adults to service in their community. This program was specifically designed not only to attract volunteers to help deliver important services, but also to facilitate professional growth in volunteers themselves and provide work experience. We can see many of these same objectives playing out in the world of civic hacking, where some communities are introducing a new focus on skill development and technology literacy.
Neighborhood watch groups, adopt-a-highway programs and community cleanup groups are additional examples where volunteers are helping to provide a service that would ordinarily fall to government alone to provide. In the City of Philadelphia, there is a formal program to designate citizens who have the support of their neighbors as “Block Captains” – this individuals act as the liaison for a community and interact directly with designated employees within city government. It is worth noting that these Block Captains are given official standing with the city (each is issued an ID card) and provided with support materials and training.
These are just a few examples of the long history that governments have in collaborating with outside partners that volunteer their time, skills and expertise. But what is interesting to me is that there is very little discussion about the role of civic hacking in this larger picture of volunteerism.
Why does government collaboration with other kinds of volunteer groups seem do differ so much from civic hacking?
Working Towards More Effective Civic Hacking
We should say to critics in the media or elsewhere that failure is an essential part of government, just as it is in private enterprise. And the cost of failure should be tiny, dwarfed by its rewards…It’s much better to fail fast, fail cheap, and then put things right at a fraction of the cost.
One of the things that governments face the most challenges in adopting as part of how they build and implement technology is agile development methodologies that employ iteration and failure as tools to develop better products. For a variety of reasons, this approach can be hard for government to adopt.
Civic hacking groups, however, present an enormously valuable potential partner for governments – they can help develop and test a variety of different solutions outside of the traditional government contracting and procurement processes that can be used to see what works, and (perhaps more importantly) what doesn’t.
It’s agile development by proxy – or could be, if governments were able to see the value of stronger relationships with civic hacking groups.
One of the common themes that emerges when we look at the different kinds of volunteer activities that governments rely on to help provide important government services is that they have the official sanction of the government. This point is probably easiest to see with volunteer firefighters, but it is common with other volunteer groups as well. Adopt-a-highway programs use signage on roadways to designate the groups responsible for cleaning them, and Philadelphia Block Captains are given ID cards and an appointed liaison from the city to assist in their efforts.
This official sanction from government seems to be missing when it comes to civic hacking groups.
To be sure, it is not uncommon – particularly in cities like Philadelphia and Chicago – to see city representatives regularly attending civic hacking events. But the presence of these people represents an ambiguous commitment from government primarily because most civic hacking events take place after hours or on weekends. Are these individuals attending in their official capacity, or as enthusiastic volunteers themselves? It’s not always clear.
The one investment that governments have always made in the volunteer activities that support their efforts are resources – typically financial. In fact, it is the lack of resources that has most directly impacted the success of these volunteer efforts.
I think that civic hacking needs to be viewed in this broader tradition of volunteerism in America – a tradition that is important to the effective and efficient delivery of public services. Governments need to officially recognize and partner with outside hackers and technologists – not unlike what was tried in New York City under the Bloomberg Administration.
In addition, governments must invest in the resources to support civic hacking – most importantly, governments need to provide high quality open data and other “raw materials” for creating new solutions.
We can’t underestimate the importance of the official sanction and support that other kinds of civic volunteerism receive from government. It’s what defines these efforts and sustains them.
It’s time for us to see civic hacking as an essential component of the Collaborative State and recognize its place in the proud tradition of volunteerism that has helped to strengthen this country.
“The future is already here – it’s just not evenly distributed.”
William Gibson. The Economist, December 4, 2003
The last time I herd Tim O’Reilly speak was at the Accela Engage conference in San Diego earlier this year. In his remarks, Tim used the above quote from William Gibson – it struck me as a pretty accurate way to describe the current state of open data in this country.
Open data is the future – of how we govern, of how public services are delivered, of how governments engage with those that they serve. And right now, it is unevenly distributed. I think there is a strong argument to be made that data standards can provide a number of benefits to small and mid-sized municipal governments and could provide a powerful incentive for these governments to adopt open data.
One way we can use standards to drive the adoption of open data is to partner with companies like Yelp, Zillow, Google and others that can use open data to enhance their services. But how do we get companies with 10s and 100s of millions of users to take an interest in data from smaller municipal governments?
When we talk about open data, it’s important to keep in mind that there is a lot of good work happening at the federal, state and local levels all over the country. Plenty of states and even counties doing good things on the open data front, but for me it’s important to evaluate where we are on open data with respect to cities.
States typically occupy a different space in the service delivery ecosystem than cities, and the kinds of data that they typically make available can be vastly different from city data. State capitols are often far removed from our daily lives and we may hear about them only when a budget is adopted or when the state legislature takes up a controversial issue.
In cities, the people that represent and serve us us can be our neighbors – the guy behind you at the car wash, or the woman who’s child is in you son’s preschool class. Cities matter.
As cities go, we need to consider carefully that importance of smaller cities – there are a lot more of them than large cities and a non-trivial number of people live in them.
If we think about small to mid-sized cities, these governments are central to providing a core set of services that we all rely on. They run police forces and fire services. They collect our garbage. They’re intimately involved in how our children are educated. Some of them operate transit systems and airports. Small cities matter too.
Big cities vs. small cities on open data
So if cities are important – big and small – how are they doing on open data? It turns out that big cities have adopted open data with much more regularity than smaller cities.
If we look at data from the Census Bureau on incorporated places in the U.S. and information from a variety of sources on governments that have adopted open data policies and making open data available on a public website, we see the following:
9 of the 10 largest US cities have adopted open data.
19 of the top 25 most populous cities have adopted open data.
Of cities with populations > 500k, 71% have adopted open data.
256 incorporated places in the U.S. with populations between 500k – 100k.
Only 39 have open data policy or make open data available.
A mere 15% of smaller cities have adopted open data.
The data behind this analysis is here. As we can see, it shows a markedly different adoption rate for open data between large cities (those with populations of 500,000 or more) and smaller cities (those with populations between 100,000 and 500,000).
Why is this important?
We could chalk up this difference to the fact that big cities simply have more data. They may have more people asking for information, which can drive the release of open data. They have larger pools of technologists, startups and civic hackers to use the data. They may have more resources to publish open data, and to manage communities of users around that data.
I don’t know that there is one definitive answer here – there’s ample room for discussion on this point.
We should care about this because – quite simply – a lot of people call smaller cities home. If we add up the populations of the 256 places noted above with populations between 100,000 and 500,000, it actually exceeds the combined population of the 34 largest cities (with populations of 500,000 or more) – 46,640,592 and 41,155,553 respectively. Right now these people are potentially missing out on the many benefits of open data.
But more than simple math, if one of the virtues of our approach to democracy in this country is that we have lots of governments below the federal level to act as “laboratories of democracy” then we’re missing an opportunity here. If we can get more small cities to embrace open data, we can encourage more experimentation, we can evaluate the kinds of data that these cities release and what people do with it. We can learn more about what works – and what doesn’t.
There’s at least a few things we can do to address this problem.
First, we need more options for smaller governments to release open data. We’re not going make progress in getting smaller governments to adopt open data if the cost of standing up a data portal has the same budget impact as the salary for a teacher, or a cop, or a firefighter, or a building inspector – I just don’t think that’s sustainable.
Equally important, we need to work on developing useful new data standards. This won’t always be easy, but it’s important work and we need to do it.
For smaller cities without the deep technology, journalism and research communities that can help drive open data adoption, data standards are a way to export civic technology needs to larger cities. I believe they are critical to driving adoption of open data in the many small and midsized cities in this country.
We’ve already seen what open data looks like in big cities, and they are already moving to take the next steps in the evolution of their open data programs – but smaller cities risk getting left behind.
They next frontier in open data is in small and mid-sized cities.
Ever since the botched launch of Healthcare.gov, procurement reform has become the rallying cry of the civic technology community.
There is now considerable effort being expended to reimagine the ways that governments obtain technology services from private sector vendors, with an emphasis being placed on new methods that make it easier for governments to engage with firms that offer new ideas and better solutions at lower prices. I’ve worked on some of these new approaches myself.
The biggest danger in all of this is that these efforts will ultimately fail to take hold – that after a few promising prototypes and experiments governments will revert to the time honored approach of issuing bloated RFPs through protracted, expensive processes that crowd out smaller firms with better ideas and smaller price tags.
I worry that this is eventually what will happen because far too much time, energy and attention is focused on the procurement process while other, more fundamental government processes with a more intimate affect on how government agencies behave are being largely ignored. The procurement process is just one piece of the puzzle that needs to be fixed if technology acquisition is to be improved.
Right now, the focus in the world of civic technology is on fixing the procurement process. But what if we’re doing it wrong?
Things Better Left Unsaid
During the eGovernment wave that hit the public sector in the late 90’s to early 2000’s, tax and revenue collection agencies were among the first state agencies to see the potential benefits of putting services online. I had the good fortune to work for a state revenue agency around this time. My experience there, when the revenue department was aggressively moving its processes online and placing the internet at the center of its interactions with citizens, permanently impacted how I view technology innovation in government.
It’s hard for people to appreciate now, but prior to online tax filing state tax agencies would get reams and reams of paper returns from taxpayers that needed to be entered into tax processing systems, often by hand. Standard practice at the time was to bring on seasonal employees to do nothing but data entry – manually entering information from paper returns into the system used to process returns and issue refunds.
The state I worked for at the time had a visionary director that embraced the internet as a game changer in how people would file and pay taxes. Under his direction, the revenue department rolled out innovative programs to fundamentally change the way that taxpayers filed – online filing was implemented for personal and business taxpayers, and the department worked with tax preparers to implement a new system that would generate a 3D bar code on paper returns (allowing an entire tax return and accompanying schedules to be instantly captured using a cheap scanning device).
When these new filing options were in place, the time to issue refunds plummeted from weeks to days, and most personal income taxpayers saw their refunds issued from the state in just a couple of days. By this time, I had moved to the Governor’s office as a technology advisor and was leading an effort to help state departments move more and more services online. I wanted to use the experience of the revenue department to inspire others in state government – to tout the time and cost savings of moving existing paper processes to the internet, making them faster and cheaper.
When I asked the revenue director for some specifics on cost savings that I could share more broadly, his response could not have been further from what I expected.
He told me rather bluntly that he didn’t want to share cost saving estimates from implementing web-based services with me (or anyone else for that matter). Touting costs savings meant an eventual conversation with the state budget office, or questions in front of a legislative committee, about reducing allocations to support tax filing. The logic would go something like this – if the revenue department was reducing costs by using web-based filing and other programs, then the savings could be shifted to other department and policy areas where costs were going up – entitlement programs, contributions to cover the cost of employee pensions, etc.
All too often, agencies that implement innovative new practices that create efficiencies and reduce costs see the savings they generate shifted to other, less efficient areas where costs are on the rise. This is just one aspect of the standard government budgeting process that works against finding new, innovative ways for doing the business of government.
Time to Get Our Hands Dirty
A fairly common observation after the launch of Healthcare.gov is that governments need to think smaller when implementing new technology projects. But at the state and local level, there are actually some fairly practical reasons for technology project advocates to “think big,” and try and get as big a piece of the budget pie as they can.
There is the potential that funding for the next phase of a “small” project might not be there when a prototype is completed and ready for the next step. From a pure self-interest standpoint, there are strong incentives pushing technology project advocates to get as much funding allocated for their project as possible, or run the risk that their request will get crowded out by competing initiatives. Better to get the biggest allocation possible and, ideally, get it encumbered so that there are assurances that the funding is there if things get tight in the next budget cycle.
In addition, there are a number of actors in the budget process at all levels of government (most specifically – legislators) who equate the size of a budget allocation for a project with its importance. This can provide another strong incentive for project advocate to think big – in many cities and states, funding for IT projects is going to compete with things like funding for schools, pension funding, tax relief and a host of other things that will resonate more viscerally with elected officials and the constituencies they serve. This can put a lot of pressure on project advocates to push for as much funding as they can. There’s just too much uncertainty about what will happen in the next budget cycle.
Its for all of these reasons that I think it’s time for advocates of technology innovation in government to get their hands dirty – to roll up our sleeves and work directly with elected officials and legislators to educate them on the realities of technology implementation and how traditional pressures in the budget process can work to stifle innovation. There are some notable examples of legislators that “get it” – but we’ve got yeoman’s work to do to raise the technology IQ of most elected officials.
Procurement reform is one piece of the puzzle, but we’ll never get all the way there unless we address the built in disincentives for government innovation – those that are enforced by the standard way we budget public money for technology projects (and everything else). We’re having conversations in state houses and city halls across the country about the future costs of underfunding pensions, but I don’t think we’re having conversations about the dangers of underfunding technology with the same degree of passion.
Time for us to wade into the morass and come back with a few converts. We’ve got work to do.
It’s really interesting to see so many governments start to use GitHub as a platform for sharing both code and data. One of the things I find interesting, though, is how infrequently governments use standard licenses with their data and app releases on GitHub.
Before leaving the City of Philadelphia, I began experimenting with a new approach. I created a stand-alone repository for our most commonly used set of terms & conditions. Then, I added the license to a new project as a submodule. With this approach, we can ensure that every time a set of terms & conditions is included with a repo containing city data or apps that the language is up to date and consistent with what is being used in other repos.
This adds a new subdirectory in the parent repo named ‘license’ that contains a reference to the repo holding the license language. Any user cloning the repo to use the data or app, simply does (for purposes of demonstration, using this rep):
Following up on my last post, and a recent trip to St. Paul Minnesota for the NAGW Annual Conference to talk about open data APIs, I wanted to provide a few insights for proper API stewardship for any government looking to get started with open data, or those that already have an open data program underway.
Implementing an API for your open data is not a trivial undertaking, and even if this is a function that you outsource to a vendor or partner it’s useful to understand some of the issues and challenges involved.
This is something that the open data team in the City of Philadelphia researched extensively during my time there, and this issue continues to be among the most important for any government embarking on an open data program.
In no particular order, here are some of the things that I think are important for proper API stewardship.
Implement Rate Limiting
APIs are shared resources, and one consumer’s use of an API can potentially impact anther consumer. Implementing rate limiting ensures that one consumer doesn’t crowd out others by trying to obtain large amounts of data through your API (that’s what bulk downloads are for).
If you want to start playing around with rate limiting for your API, have a look at Nginx – an open source web proxy that makes it super easy to implement rate limits on your API. I use Nginx as a reverse proxy for pretty much every public facing API I work on. It’s got a ton of great features that make it ideal for front ending your APIs.
If the kind of data you are serving through your API is also the kind that consumers are going to want to get in bulk, you should make it available as a static – but regularly updated – download (in addition to making it available through your API).
In my experience, APIs are a lousy way to get bulk data – consumers would much rather get it as a compressed file they can download and use without fuss, and making consumers get bulk data through your API simply burdens it with unneeded traffic and ties up resources that can affect other consumers’ experience using your API.
If your serving up open data through your API, here are some additional reasons that you should also make this data available in bulk.
Use a Proxy Cache
A proxy cache sits in between your API and those using it, and caches responses that are frequently requested. Depending on the nature of the data you are serving through your API, it might be desirable to cache responses for some period of time – even up to 24 hours.
For example, an API serving property data might only be updated when property values are adjusted – either through a reassessment or an appeal by a homeowner. An API serving tax data might only be updated on a weekly basis. The caching strategy you employ with your open data API should be a good fit for the frequency with which the data behind it is updated.
If the data is only updated on a weekly basis, there is little sense in serving every single request to your API through a fresh call down the stack to the application and database running it. It’s more beneficial for the API owner, and the API consumer, if these requests are served out of cache.
There are lots of good choices for standing up a proxy cache like Varnish or Squid. These tools are open source, easy to use and can make a huge difference in the performance of your API.
Always Send Caching Instructions to API Consumers
If your API supports CORS or JSONP then it will serve data directly to web browsers. An extension of the cacheing strategy discussed above should address cache headers that are returned to browser-based apps that will consume data from your API.
There are lots of good resources providing details of how to effectively employ cache headers like this and this. Use them.
Evaluate tradeoffs of using ETags
ETags are related to the cacheing discussion detailed above. In a nutshell, ETags enable your API consumers to make “conditional” requests for data.
When ETags are in use, API responses are returned to consumers with a unique representation of a resource (an ETag). When the resource changes – i.e., is updated – the ETag for that resource will change. A client can make subsequent requests for the same resource and include the original ETag in a special HTTP header. If the resource has changed since the last request, the API will return the updated resource (with an HTTP 200 response, and the new ETag). This ensures that the API consumer always gets the latest version of a resource.
If the resource hasn’t changed since the last request, the API will instead return a response indicating that the resource was not modified (an HTTP 304 response). When the API sends back this response to the consumer, the content of the resource is not included, meaning the transaction is less “expensive” because what is actually sent back as a response from the API is smaller in size. This does not, however, meant that your API doesn’t expend resources when ETgas are used.
Generating ETags and checking them against those sent with each API call will consume resources and can be rather expensive depending on how your API implements ETags. Even if what gets sent over the wire is more compact, the client response will be slowed down by the need to match ETags submitted with API calls, and this response will probably always be slower than sending a response from a proxy cache or simply dipping into local cache (in instances where a browser is making the API call).
Also, if you are rate limiting your API does responses that generate an HTTP 304 count against an individual API consumer’s limit? Some APIs work this way.
Fresh off a week in San Diego for the annual Accela Engage conference (where Tim O’Reilly gave a keynote presentation) and some stolen hours over the weekend for hacking together an entry in the Boston HubHacks Civic Hackathon, I’ve got government APIs front of mind.
Getting to hear the Godfather of “Government as a Platform” speak in person is always a treat, and Tim was kind enough to share the awesome slide deck he used for his talk. The chance to follow up on an event like Engage with some heads down time to bang out a quick prototype for the City of Boston was a great opportunity to frame some of the ideas discussed at the conference.
For me, this quick succession of events got me thinking about both the promise and the pitfalls of government APIs.
APIs: The Promise
The thing I love the most about the Boston Civic Hackathon is the way the city approached it. Prior to the event, the organizers took time to clearly articulate issues the city was trying to address. Materials given in advance to participants provided exhaustive information about the permitting process and clearly listed the things the city needed help with. Additionally, a few experimental API endpoints were stood up for participants to use during the event.
These APIs weren’t the easiest to use but they were helpful in creating prototypes that would allow city leaders to see the possibilities of collaborating with outside developers. It should be noted that Code for Boston – the local Code for America Brigade – was heavily involved in the event. This was a smart move by the city to include the leadership in the local civic hacking movement in the event right from the start.
So, out of the gate, this event provided immediate tangible benefits for the city – without even one line of code being written. The city benefits immensely from the time and effort that went into describing and documenting the current permitting system and the many shortcomings it has. This is a process that far too few governments undertake, even when they are crafting expensive and elaborate RFPs.
There appeared to be a healthy level of participation, despite the fact that there was another hackathon happening in Boston on the same weekend, indicating that the message from City Hall was being taken seriously by the local technology community. In all, nine apps (including one of my own) were submitted for review – each of these submissions provides powerful insights for city officials into what is possible when governments leverage the talents of outside developers using an API.
But at the same time, I think this event helps to highlight some of the pitfalls that governments (particularly municipal governments) face when deploying APIs and moving towards government as a platform. These challenges can derail efforts to collaborate with local civic hackers to improve the quality of services that governments provide, so its important to understand what they are.
APIs: The Pitfalls
To it’s credit, the City of Boston took steps to create new APIs for it’s legacy permitting system – to allow civic hackers to create prototypes that can help illustrate what is possible. And, as it turns out, a good people want to take them up on this.
Boston is one of the most progressive cities in the country when it comes to engaging civic technologists. But building and managing a custom API can be a challenge for any government and it is an endeavor not to be undertaken lightly. In addition, along with the new role of managing a production-grade API for external development, governments face the relatively new challenge of building and managing developer communities around them. To put it lightly, this ain’t easy – particularly for governments that haven’t done it before.
It looks like Boston’s current vendor doesn’t supply a baked-in API for their permitting system, so the city stood up a few custom endpoints for developers to work with over the weekend. This is a great approach to support a weekend hacking event, but if the city is serious about coaxing developers into investing time and money building new civic apps on top of an API, the demands can increase dramatically.
Production APIs done right require stewardship – this includes ensuring adequate reliability, authentication, versioning and a host of other things that building a demo API does not. If developers perceive that an API is unstable or lacks proper stewardship, they won’t invest the time building services that take advantage of it.
Another potential issue for Boston is that – even if they are able to create and manage a robust API for developers – the API for their permitting system will likely differ from the APIs of other cities. So, they may not be able to leverage talent outside of those interested in building an app specifically for the City of Boston.
The more that cities can share common platforms and APIs, the more they can amplify the benefits of collaborating with outside developers – an app built in one city can more easily be deployed to another, making the benefits to developers that build apps exponentially greater.
It’s great to see the City of Boston actively organizing hackathons to solicit ideas for how government service can be improved. I hope this event, and others to come, can help focus attention on the significant issues governments face in developing and managing open APIs for civic hackers.
The civic entrepreneurs behind Open Counter recently launched a new service called Zoning Check that lets prospective businesses quickly and easily check municipal zoning ordinances to determine where they can locate a new business.
This elegantly simple app demonstrates the true power of zoning information, and underscores the need for more work on developing standard data specifications between governments that generate similar kinds of data.
In a recent review of this new app, writer Alex Howard contrasts the simple, intuitive interface of Zoning Check with the web-based zoning maps produced by different municipal governments. Zoning Check is obviously much easier to use, especially for its intended audience of prospective business owners. And while this certainly is but one of many potential uses for zoning information, it’s hard to argue with the quality of the app or how much different it is than a standard government zoning map.
But to me, more than anything else, this simple little civic application provides an object lesson in the need for governments to invest less time and resources building new citizen-facing applications themselves and more time and resources mustering the talents of outside developers that can build more effective citizen-facing apps better, faster and cheaper.
To do this, governments need to reimagine their place in the civic technology production chain. In short, governments need to stop being app builders and start becoming data stewards.
There are a number of reasons why the role of data steward is a better one for governments – most importantly, governments don’t typically make good bets on technology. They’re not set up to do it properly, and as a result its not uncommon to see governments invest in technology that quickly becomes out of date and difficult to manage. This problem is particularly acute in relation to web-based services and applications – which outside civic technologists are very good at building – because the landscape for developing these kinds of applications changes far too rapidly for governments to realistically stay current.
Governments that focus on becoming data stewards are better able to break out of the cycle of investing in technology that quickly becomes out of date. It is these governments that are moving to release open data and deploy APIs to enable outside developers to build applications that can help deliver services and information to citizens. But in addition to procurement and recruitment hurdles that make it difficult for governments to get the technology of citizen-facing apps right, governments may also lack the proper perspective to develop targeted applications that expertly solve the problem of a specific class of users.
The truth of it is this – even if the processes by which businesses find out where they can locate, and what permitting and licensing requirements they need to comply with are terrible, there typically isn’t much they can do about it. Government’s lack proper incentives to get apps like this right because no one is competing with them to provide the service. If government’s change their role to that of a data steward, they can foster the creation of multiple apps that can deliver information to users in a much more effective way. Assuming the role of data steward would set up a competitive dynamic that would foster better interfaces to government information.
Look at what happened in Philadelphia when the city released crime data in highly usable formats – the city went from having one mediocre view of crime data that was developed with the sanction of the city to having a host of new applications developed by outside partners, each providing a new an unique view of the data that the city’s app simply did not provide.
Zoning Check is a great app to help center this conversation, and highlight the benefits that governments can reap if they work to transition way from being app builders and towards becoming true data stewards.
The thing I’ve always loved about hackathons is how they make it possible for anyone to build something that can help fix a problem facing a neighborhood, community or city.
Going to a hackathon isn’t like going to a government-sponsored meeting, or legislative hearing – those are places where people offer testimony to others, who may or may not take the advice given and implement some policy or legislative action. Hackathons are where people go to build actual solutions that help fix real problems.
The hacker ethos attracts people who don’t like layers of bureaucracy between the problems they see around them and the solutions that want to implement. We live in a time when it has never been easier for people without title, station or office to affect real change in the lives of people in their neighborhoods – to build solutions to fix problems they care about. This is an attractive draw for people that want to make a difference and its why the number of hackathons has grown in recent years, and continues to grow.
I see these same sentiments in an exciting project developed by Kristy Tillman and Tiffani Bell. They built the Detroit Water Project to help Detroit residents in danger of having their water service cut off get paired up with people that can make a payment (or partial payment) on their behalf.
This is the kind of project I would expect to see at a hackathon – it has very few rough edges but looks like it was put together rapidly. It effectively leverages powerful, cheap online tools like Google forms and social media to engage with people that want to get involved.
And it is absolutely brilliant in its simplicity and effectiveness.
Here’s the elevator pitch – there are folks in the City of Detroit (a city facing significant challenges) in danger of having their water service cut off because they are unable to pay their bill. This is an issue affecting thousands of people in real need and galvanizing a movement to help prevent it. The Detroit Water Project enables people anywhere in the country to help with just a few mouse clicks and at the cost of a night out on the town. Boom.
This is how people with the hacker ethos want to invest their time, talents and energy. They are surfacing the question that more people need to step up and help answer – are we going to sit by and let cities like Detroit crumble, or are we going to get off our asses and pitch in?
This project wasn’t built at a hackathon, but it’s everything a hackathon project can be (and should be). Kristy and Tiffani are hackers – and I mean that as the highest compliment I can pay someone.
The Freedom of Information Act, passed in 1966 to increase trust in government by encouraging transparency, has always been a pain in the ass. You write to an uncaring bureaucracy, you wait for months or years only to be denied or redacted into oblivion, and even if you do get lucky and extract some useful information, the world has already moved on to other topics. But for more and more people in the past few years, FOIA is becoming worth the trouble.
I’ve always thought that the FOIA process was an important part of a healthy open data program. That may seem like an obvious thing to say, but there are lot of people involved in the open data movement who either have limited exposure to FOIA or just enough exposure to truly to loath it.
In addition, the people inside government who are responsible for responding to FOIA requests may have very different feelings about releasing data than those that are part of an open data program.
There are lots of reasons why, for advocates of open data, the FOIA process is suboptimal. A number of them are discussed in a recent blog post by Chris Whong, an open data advocate in New York City and a co-captain of the NYC Code for America Brigade, who FOIA’d the NYC Taxi & Limousine Commission for bulk taxi trip data.
Chris’ post details many of things that open data advocates dislike about the FOIA process. It’s an interesting read, especially if you don’t know how the FOIA process works.
However, another more serious shortcoming of the FOIA process became obvious almost immediately after the taxi trip data was posted for wider use. It turns out that the Taxi & Limousine Commission had not done a sufficient job depersonalizing the data, and the encryption method used to obscure the license number of taxi drivers and their medallion number was easy to circumvent with moderate effort.
It’s obvious that the Taxi Commission tried to obscure this personal data in the files it released and to also make sure the data was as usable as possible by the person who requested it. Striking this balance can be tricky, and it’s actually not uncommon for data released through FOIA requests to have information that may be viewed as sensitive in hindsight.
I think one of the reasons this happens with data released through FOIA is that the process is not usually coupled tightly enough with the open data review process. I think we can make FOIA better (and, by extension, make the open data process better) by running more FOIA requests through the vetting and review process used to release open data.
Outcome vs. Process
In my experience, there is often very little connection between the process for responding to FOIA requests and the open data release process. Beyond reviewing FOIA requests in the aggregate to see if there are opportunities for bulk data releases, the FOIA process and the open data release process often happen independently of one another. This is certainly the case in the City of Philadelphia.
In Philly, open data releases are coordinated by the Chief Data Officer in the Office of Innovation and Technology. FOIA requests – or Right to Know Requests as they are known in the Commonwealth of Pennsylvania – are handled by staff in the Law Department, or personnel that have been identified as Right to Know Officers for their specific department.
These requests almost always get treated as one-off tasks, never to be repeated again. Even though requests for the same data may be made at a later date, I’ve never seen the people working on FOIA requests in Philly take the approach of making their work to respond to these requests repeatable.
The problem with a bifurcated approach to data releases like this is that it forces people to think of the work to respond to FOIA requests as disposable. Something that happens once – an outcome, instead of a process. Open data done correctly is about establishing a process – one that includes opportunities for review and feedback.
Toward Better FOIA Releases
Because FOIA is viewed as a one and done task, there is no opportunity to iteratively release data – if the release of NYC taxi trip data had been viewed as a process (particularly a collaborative one), the Taxi & Limousine Commission could have opted to be conservative in their initial release and then enhanced future releases based on actual feedback from real consumers of the data.
In Philadelphia, we employed a group called the Open Data Working Group to help review and vet proposed data releases. This is an interdisciplinary group from across different city departments which helped provide feedback and input on a number of important data releases that required depersonalization or redaction of of sensitive data – crime incidents, complaints filed against active duty police officers, etc.
Additionally, part of our release process involved reaching out to select outside data consumers to get feedback and help identify issues prior to broader release. Because we used GitHub for many of our data releases, we could set up private repos for our planned data releases and ask selected experts to help us vet and review by adding them as collaborators prior to making these data repos public.
Getting to Alignment
I think for a lot of amateurs, their alignment is always out.
When it comes to data releases, there is no substitute for experience – that’s why integrating FOIA releases into an existing open data release process can be so beneficial. Leveraging the process for reviewing open data releases can improve the quality of FOIA releases and bring these two critical elements of the open data process into closer alignment.
I’m hopeful that cities, particularly Philadelphia, will begin to see the merit of better aligning FOIA responses and open data releases.