Participation and the Cult of Catalogs

“Anonymous access to the data must be allowed for public data, including access through anonymous proxies. Data should not be hidden behind ‘walled gardens.’”
– 8 Principles of Open Government Data

In the world of open data, there are few things that carry more weight than the original 8 principles of open data.

Drafted by a group of influential leaders on open data that came together in Sebastopol, CA in 2007, this set of guidelines is the defacto standard for evaluating the quality of data released by governments, and is used by activists regularly to prod public organizations to become more open.

With this in mind, it was intriguing to hear a well known champion of open data at the Sunlight Foundation’s recent Transparency Camp in Washington DC raise some interesting questions about one of these principles, typically considered sacrosanct in the open data community.

Andrew Nicklin (formerly at the helm of open data efforts for both the City and State of New York, and now Open Data Director for the Center for Government Excellence at Johns Hopkins University) asked Transparency Camp attendees to consider some of the implications of the 6th principle on open data – which calls for non-discriminatory access to data. This principle is generally taken to mean that users of open data should be able to access it anonymously and that governments should not require users to identify who they are or what they plan to do with the data as a condition of accessing it.

While there is obvious merit to this principle, Andrew observed that when governments know who is using their data and how they are using it, there are enormous opportunities to enhance the data and make it more useful for data consumers. If governments don’t understand what user’s want, providing useful data that can meet their needs is difficult – strictly enforcing anonymous access to data may end up being be an impediment to better understanding what data users actually need.

Without being directly critical of the principle or the original intentions behind it, Andrew made a thoughtful suggestion for open data advocates at Transparency Camp to consider. To me, these comments highlight an important issue facing the civic technology community and governments themselves – one that almost no one is talking about.

When it comes to building the infrastructure of open data – putting in place the pieces of technology that users will leverage to find and use government open data – very little thought seems to be given to what users – data consumers – want or need.

The idea of “build with, not for” has become a central tenant to how civic technology solutions are designed and implemented. Yet this idea seldom applies to the platforms that governments use to make open data available, which form the foundation of many civic technology solutions.

Costs and Benefits

“Funding is the most cited barrier to implementing or expanding open data initiatives.”
– Empowering the Public Through Open Data

A recent collaborative effort between the University of Southern California’s Annenberg Center on Communication Leadership & Policy and the USC Price School of Public Policy produced a hugely valuable report on the current state of open data in the 88 incorporated cities comprising Los Angeles County.

Based on surveys and interviews with city officials on their open data efforts, this report provides unique insights into the ways that government leaders view open data. Among the findings – government officials surveyed for the report consider funding to be the most significant barrier to expanding work on open data. This isn’t a surprise, and this sentiment is likely not unique to the Los Angeles County area.

But when taken together with other findings, it can seem counterintuitive. Along with citing funding as a constraint, government officials expressed a preference for commercial open data catalogs over open source (or free) alternatives. These commercial solutions – some of which impose non-trivial costs on local governments – appear to meet a perceived need on the part of government officials in that they are viewed as making it “easier to publish [data] and put it in the hands of the citizens.”

Commercial software generally tends to fare better in the government procurement process than open source software, so this outcome isn’t all that shocking. But it’s worth noting this contradiction in the findings of the USC report between the cost constraints limiting more progress on open data and the reported preference for (sometimes pricy) commercial open data catalogs.

Cost aside, there are a few reasons why upfront investment in a commercial open data catalog may not be the best way to start a new open data effort.

Architecting Participation

The web … took the idea of participation to a new level, because it opened participation not just to software developers but to all users of the system.
– Tim O’Reilly, The Architecture of Participation

First, and somewhat ironically, public information on the cost of commercial open data portals can be hard to come by. Another report on municipal open data efforts in southern California found a wide disparity in what different governments – some just a few miles apart, and almost identical in population – pay for commercial open data catalogs. This can make it difficult for governments to know if they are getting good value for the price being paid.

In addition, commercial open data catalogs often come with visualization, mapping and charting tools out of the box. This can make it easier for governments to augment open data offerings by showing what can be done with it. Though these offerings may come at an additional price, some may view them as a way to help advocate open data to internal skeptics – a picture (or a graph, or a chart) is worth a thousand words as the saying goes.

From a user needs perspective, this approach feels very unidirectional – this is government telling the data community what it believes is important, not the other way around. There are a host of examples of sophisticated visualizations and applications being built with government data by outside data users. And while this approach requires outreach and engagement, there is an ever increasing abundance of tools available for members of the data community to use to create maps, visualizations and new applications.

These two approaches – out of the box vs. community built – are not mutually exclusive. We can see a number of examples of governments using commercial open data catalogs to engage with external data users that produce useful, valuable visualizations and apps – New York City, the City of Los Angeles, Chicago and San Francisco are all great examples of this dual approach.

However, open data efforts in all of those cities have benefited from robust technology and startup communities and often visionary leadership. Almost all of these cities have a long tradition of civic hacking. For cities that don’t have these assets (or have them in smaller quantities), outreach and engagement to nurture and build a data community will be a crucial factor in the long-term success of an open data program. These cities – many of them smaller and with more limited resources – may also feel the cost constraints of implementing an open data effort more acutely than larger cities.

It’s fair to say that the next wave of cities that adopt open data programs may face a very different set of challenges than the cities that have come before them.

Putting Users First

“The procurement model of government digital services generally leads to services that satisfy policy needs, not user needs.”
– Government Technology Procurement Playbook, Code for America

The time feels right to rethink how cities put in place the basic infrastructure of open data.

At last year’s Code for America Summit, I gave a talk on how open data was being adopted in small to midsized cities in the U.S. In researching my talk, I found that while larger cities have almost all implemented some form of open data program, less than 20% of the 256 incorporated places in this U.S. with populations between 100,000 and 500,000 have an open data program.

Open data in this country is still – almost exclusively – a big city phenomenon.

Efforts to address this imbalance are underway – the What Works Cities initiative (of which the Center for Government Excellence at Johns Hopkins is a key part) is now working to bring open data and data-driven decision making to 100 mid sized cities. More and more small and mid sized cities are starting to look at open data as a key driver of government innovation.

We are now at a juncture where we can not only help a new cohort of cities adopt open data, but to help ensure that these efforts embrace the principle of “build with, not for” from the ground up. If we’re going to be successful, it’s important that we question long-held beliefs – like the original 8 principles of open data – to ensure our efforts are most efficiently aligned with the outcomes we desire.

It’s worth considering whether commercial open data catalogs provide the best option for the next wave of cities that are embracing open data to succeed and build a healthy data culture, both inside and outside of government.

But whatever foundation we choose to lay for the next phase of open data, we’ll need to make sure we’re putting user’s needs first.

(Note – the term “cult of catalogs” is not my own. I first heard it used by Friedrich Lindenberg, though others may have used it as well.)

One response to “Participation and the Cult of Catalogs”

Augusto Herrmann

January 12, 2016 at 4:17 pm

We should consider the possibility of another exception to the 6th principle of open data, and it has to do with the cost of infrastructure and usage rates.

Internet companies that have public APIs (e.g. Google Maps) already use throttling to limit the amount of requests per minute that clients can do. Scaling infrastructure up to meet usage demands is something that costs money, and when we’re talking open government data this is taxpayers’ money. So if e.g. just one corporate consumer of open government data is consuming all of the capacity, either the government scales up the service capacity (thus raising costs) or not (degrading the usage experience for everyone else). I don’t think either option is fair to citizens. So, to use a rate limited throttling for the general case, and to provide a special access token for the big consumers that lifts or raises those limits (possibly at a cost to them, to cover the additional operational costs) does seem a fair thing to do.

Of course, the anonymous (if rate limited) access to open data must always be provided, in order to foster experimentation and lower the barriers to usage of open data. To be fair, the 6th principle never mentioned that you shoudn’t use throttling anyway.

As for the original reason you mentioned for having non-anonymous access to open data, please see Leigh Dodds’ discussion of the same issue: http://blog.ldodds.com/2015/11/25/how-can-open-data-publishers-monitor-usage/

Civic Innovations

Leave a comment