Data is Law

“…[U]nless we understand how cyberspace can embed, or displace, values from our constitutional tradition, we will lose control over those values. The law in cyberspace – code – will displace them.”
— Lawrence Lessig (Code is Law)

In his famous essay on the importance of the technological underpinnings of the Internet, Lawrence Lessig described the potential threat if the architecture of cyberspace was built on values that diverged from those we believe are important to the proper functioning of our democracy. The central point of this seminal work seems to grow in importance each day as technology and the Internet become more deeply embedded into our daily lives.

But increasingly, another kind of architecture is becoming central to the way we live and interact with each other – and to the way in which we are governed and how we interact with those that govern us. This architecture is used by governments at the federal, state and local level to share data with the public.

This data – everything from weather data, economic data, education data, crime data, environmental data – is becoming increasingly important for how we view the world around us and our perception of how we are governed. It is quite easy for us to catalog the wide range of personal decisions – some rote, everyday decisions like what to wear based on the weather forecast, and some much more substantial like where to live or where to send our children to school – that are influenced by data collected, maintained or curated by government.

It seems to me that Lessig’s observations from a decade and a half ago about the way in which the underlying architecture of the Internet may affect our democracy can now be applied to data. Ours is the age of data – it pervades every aspect of our lives and influences how we raise our children, how we spend our time and money and who we elect to public office.

But even more fundamental to our democracy, how well our government leaders are performing the job we empower them to do depends on data. How effective is policing in reducing the number of violent crimes? How effective are environmental regulations in reducing dangerous emissions? How well are programs performing to lift people out of poverty and place them in gainful employment? How well are schools educating our children?

These are all questions that we answer – in whole or in part – by looking at data. Data that governments themselves are largely responsible for compiling and publishing.

Viewed in this way it is easy to see the importance of governments freely publishing the data that they collect. The richer and more numerous the data sets that are available from government, the more informed and precise decisions we can make in our daily lives. Governments that do not provide data are directly impacting the lives of those they serve by diminishing their ability to make better, more informed decisions.

But more importantly, governments that do not provide open data that empowers those they serve to evaluate the performance of government are diminishing the ability of people to effectively participate in our modern democracy.

Arming the People With Data

“Little more can reasonably be aimed at, with respect to the people at large, than to have them properly armed and equipped…”
— Alexander Hamilton (Federalist Papers)

Hamilton’s often cited quote about an armed populace seems outdated today, but if we view it through the lens of modern technology it makes enormous sense. We live in an age when vast amounts of data can be processed and displayed quickly and cheaply – in our age, “arming” the people at large means providing access to data and information. Specifically, the spirit of Hamilton’s statement about a properly equipped public obligates governments to release the raw data that is collected as part of government operations or used to inform policy decisions.

In the recent past (during what has come to be known as the age of “e-government”) governments were encouraged to publish new data and information to public websites – in some cases resulting in information being published in electronic format for public consumption for the very first time. We still see the residual effect of this effort in the tension between public data and open data.

Public data is available for viewing, typically as an HTML page (or series of pages), a PDF document or in some similar format suitable for publishing on the web. It is meant to be consumed by eyeballs, not by computer programs or applications. Open data is specifically formatted for use by machines – software applications that can consume it, process it, mash it up with other data and (optionally) display it.

Making valuable data “public” but not “open” is a departure from the idea embodied in Hamilton’s notion of a properly armed & equipped public – and it is typically at odds with the requirements of most open data policies. Some more cynical observers have suggested that it is a way for government officials to lay claim to the ideals of open data while ensuring that information remains difficult to use for serious analysis. This may be – for example – why the vast majority of campaign finance and lobbying information published by governments is made available in PDF format only.

In an age where data is law, an absence of open data diminishes the ability of the people to effectively participate in our modern democracy.

Collective Responsibility for Open Data

“Thus the choice is not whether people will decide how cyberspace regulates. People – coders – will. The only choice is whether we collectively will have a role in their choice – and thus in determining how these values regulate – or whether collectively we will allow the coders to select our values for us.”
— Lawrence Lessig (Code is Law)

Whether or not our governments publish open data is in large part a function of how strongly those they serve advocate for it. We have the option to collectively have a role in deciding how governments make data and information available to us, or whether they do not.

But just as important, we need to have insights into the quality of the data that governments make available as open data, and to understand which data sets governments possess but have not yet shared with the public. In a world where data is law, understanding how data is collected, what it means and the constraints that apply to it is fundamental to understanding not only how well government is performing but also the world we live in. We must understand the incentives and biases that drive how data is reported and the effect they can have on what it may mean.

Here are a few examples to illustrate this point:

  • We look at information from standardized testing to determine how well teachers and school administrators are doing their jobs. In doing so, we need to understand how such data is collected, and the underlying biases that may undermine the integrity of such data. It can be tempting (comforting even) to accept at face value reports that schools are performing well – but there are examples where the truth diverges widely from what the data says.
  • Crime data is among the most widely requested data from state and local governments, and it can impact decisions on home purchases, school enrollment and a range of other important personal decisions. It is also widely used in social policy research and and other analytic efforts. But at the most basic reporting level, there can be enormous pressures to underreport certain kinds of crimes.
  • Information on how clean and well run eating establishments are is now readily available from mobile devices and other public services. One of the most compelling examples of the immediacy and efficiency of the data age we live in is the ability to check how well a restaurant or eatery performed on a government food safety inspection before we place our order. To make the experience even more immediate, we distill often complex information into a numeric or letter rating. However, the immediacy of the experience can often obscure incentives to inflate these ratings – making the data much less useful.

In each of these cases, and in many others where the integrity of data may be suspect, our ability to evaluate the quality of data is enabled by our ability to access it in bulk. Bulk data in open formats enables consumers to spot anomalies – either in the data set itself (e.g., a school that dramatically outperforms all others) or when compared to similar data from other jurisdictions (restaurant inspections in one city vs. another).

As we approach 2015, it is time for us to insist that our governments fully enter the data age and provide access to more data in bulk, in open formats and in concert with other governments.

Having access to open data is no longer an option for participating effectively in our modern democracy, it’s a requirement. Data – to borrow Lessig’s argument – has become law.

4 comments

  1. Albert · December 29, 2014

    “Public data is available for viewing, typically as an HTML page (or series of pages)”
    “Open data is specifically formatted for use by machines ”

    microformats have been making html machine-readable for a long time now; html is the first or second (depending on your source) most utilized semwebtech alive today.

    whenever possible, i publish data in html as well as other typical formats like csv or json. the markup gives the user something to immediately consume.
    it increases the findability of the data by giving the bots more to harvest for the search engines.
    properly implemented html can also extend the data’s reach in terms of usability and accessibility

    i see the benefits of good markup being endless, and when applicable, should be a standard format when publishing open data.

    the crucial intersection with open gov only furthers this: properly formatted markup doesn’t have cross-browser/platform issues, doesn’t require proprietary software to run, and is consumable by assistive technologies.

    it seems like html is an afterthought (if anything….i’m looking at you markdown) in the world of civic hacking….so i gotta ask why? or have i missed the elephant in the room that has everyone steering clear of it?

  2. mheadd · December 29, 2014

    I’d love to see more governments use microformats. Not aware of any god examples of that – might be an interesting research project.

    When it comes to HTML, most governments that are using this as a format for their data are only thinking about one use case – human eye balls. There is often little thought to how the data behind an HTML web page might be used outside of the web page it is presented on.

    For example, take a look at this site for displaying restaurant inspection reports from the City of Philadelphia – http://www.phila.gov/health/foodprotection/FoodSafetyReports.html

    How much better would this service be if the raw data behind t was available in JSON or CSV?

  3. Albert · December 29, 2014

    i’m not aware of any govs using microformats either.
    i agree that the philly service should use json/csv here, but that wasn’t my point. i think they should offer both.
    offering the data in bulk would make the service better for data junkies, businesses, research teams, etc.
    offering the data in html would make the service better for users and bots.
    the way its set up now, loading the data onto the same page, etc., doesn’t quality in my opinion as publishing in html properly. each report or business should be in a separate document with its own url.
    i’m not arguing html vs json/csv. i think they all have their place and are equally important.

  4. mheadd · December 30, 2014

    “i think they should offer both. offering the data in bulk would make the service better for data junkies, businesses, research teams, etc. offering the data in html would make the service better for users and bots.”

    Completely agree. When I worked at the City of Philadelphia, this is exactly the approach we took in designing new digital services.

    For example, when looking up a specific property using the city’s property look up app: http://property.phila.gov/#account/883309000

    And when looking for API access to the same information: http://api.phila.gov/opa/v1.1/property/5356001234?format=json

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s