Operation Data Liberation


Image courtesy of Flickr user antonymayfield. View license here.

I’ve had the opportunity recently to talk to people in several different city governments that are facing a common challenge — how to liberate operational data from a legacy system.

This is a challenge that lots of city governments face, and it strikes me that there are some common lessons that can be derived from cities that have gone down this road already for those that are still trying to figure out the right approach.

The following suggestions are crafted from my own experience as a municipal government official charged with making data more widely available, and those of people in similar positions that I’ve had a chance to speak with.

Read More

One More Week, Three More Things

My time as the City of Philadelphia’s Chief Data Officer is coming to an end.

It’s been an incredible experience – I’ve had the pleasure of working with a great team, and to have helped change the way that government officials think about open data and civic hacking. Before I move on to new things, however, I have a few more items I want to move into the “Done” column.

I’ve got one more week and three more things I want to get done.

Integrating with Data.gov

One of the things we’ve had on our radar for a while is getting the City of Philadelphia’s open data sets from OpenDataPhilly.org listed in the Cities community in the federal Data.gov site. Currently, OpenDataPhilly.org is running an outdated version of the Open Data Catalog software – which has Data.gov integration built in – but a simple Node.js script and the OpenDataPhilly API make it pretty simple to generate the required metadata file for listing our data sets there.

The metadata for our data now validates, and I’m hopeful that listings from OpenDataPhilly.org will appear in Data.gov this week – making it the first time listings from a community-managed site are listed in the national site for open data.

Data on Professional Services Contracts

For the past several months, we’ve been working with the Chief Integrity Officer to open up data on professional services contracts from city departments. Currently, information on professional services contracts is difficult to find, and is provided only in PDF format.

An enormous amount of work has gone into generating comprehensive documentation for professional services contract data, as well as providing the data in a more usable format (as well as providing historical data going back several quarters). This work is close to being finished and I’m hopeful it will be launched in a few days.

Data on Lobbyist Registrations

In addition, we’ve been working with the Board of Ethics as they implement a new lobbyist management system – this new system will allow individuals that act as lobbyist to register with the city and also submit their activity reports.

As part of this process, the Board of Ethics was gracious enough to allow us to consult on how open data could be integrated into their new system. They’ve been great to work with, and they recently rolled the first phase of this new system, which allows lobbyists to register – it also includes functionality which allows all registration information to be downloaded in CSV format.

We’re working now to push out some comprehensive documentation for the lobbyist registration information CSV data – we hope to get this out before the end of the week.

Time is short, but there’s still lots to do.

Communities Matter

Philadelphia is unique among big cities in how it publishes open data for civic hackers, journalists, entrepreneurs, researchers and other users.

The City of Philadelphia has designated the community-built Open Data Philly website as it’s official data directory for open data – we’re the only big city in the country (maybe the only city period) that does not unilaterally control the data portal where city departments publish their data.

This website is shut down

Pursuing a strategy like this is not without its challenges, but I believe that it is ultimately a better way to engage the community of users around open data releases. The members of our open data community are all stakeholders in the operation and management of the city’s open data portal. They can submit new data sources for inclusion in the portal, and they can suggest changes in how the platform works.

The federal government shutdown this week – which resulted in the federal open data site, data.gov, going offiline – offers an object lesson in the benefits of community managed open data portals like those in Philadelphia, Louisville, Colorado and other places. A government shutdown can’t impact sites that the government does not unilaterally control.

And this raises some interesting questions – can an open data initiative be truly open if the government that starts it can shut it down? What happens to an open data portal when municipal leaders that start open data initiatives leave office, are voted out or are replaced by those that are less enlightened?

Some things to consider as we continue to build our community here in Philly.

The Lesson of PennApps

A couple of weeks ago, I attended the most recent PennApps hackathon – a biannual college hackathon in Philadelphia that has grown from somewhat humble beginnings a few years ago to one of the largest college hackathons in the world.

Penn Apps logo

Attendance at the event has swelled to over 1,000 participants from colleges across the country, as well as several international teams. I’ve been to PennApps 4-5 times in the past few years and it has been remarkable to see it grow. The last several I have attended, I brought with me colleagues from city government – some whom had never before been to a hackathon.

What hasn’t changed over the many installments of PennApps is the presentation of sponsor APIs at the kick off event. After a brief introduction by the organizers, the many technology companies that sponsor the event show their wares to participants. This almost always involves a short description and demo of an API or SDK that participants can use at the event to build something.

These presentations are witty, engaging and fun – they have to be. There are dozens of sponsors for the event and each is angling to encourage developers to use their tool or platform to build a working prototype by the end of the weekend. This is how companies built around APIs raise brand awareness. Increasingly, the process of building software has come to revolve around leveraging third-party platforms and APIs. This has changed that way that software development – particularly web development – happens, as well as the expectations of developers.

Venmo makes their pitch at PennApps

Most of the sponsors at an event like this one will offer some sort of additional incentive for developers that use their services – free credits, t-shirts, swag, etc. At a minimum, though, their services are easy to use and well documented. Those that are brave enough may even try a “live coding” demo – building a working application using their API or platform in just the few short minutes allotted to each sponsor presentation. When done successfully, this can help drive home the point to prospective users that a platform or API is easy to use.

Every time I attend this event with my colleagues from city government I say – “This is what governments need to do. We need to present ourselves to prospective users as if we were an API company. The same standards of quality should apply.”

It is increasingly common – and encouraging – to see governments publish developer portals. A number of different federal agencies do this, as do large cities like New York, Chicago and Philadelphia.

The truth is – if we’re being honest with ourselves – that most of them are not very good, particularly when compared with the offerings of private companies. Many don’t have common elements like an API console, code samples & tutorials, helper libraries or a discussion forum.

If we want to encourage developers to use open government data we need to be realistic with ourselves about what developers have come to expect in terms of an API offering. Events like PennApps have raised the bar for anyone that wants to encourage developers to build useful and interesting applications with their data and APIs – governments included.

We must enhance the quality of government developer portals, and we must work harder (and faster) to develop shared standards for government data and APIs. Most importantly, we have to do more to share tips, tricks and best practices between governments. There are some tools out there to get governments started down the road of building a developer center that is impactful and engaging, but we must do more.

Developer centers are not just a mandate or requirement that we need to check of our “to do” list. Government developer portals should be the hubs around which we engage and communicate with developers, technologists and the broader data community.

In a subsequent post, I’ll share a checklist that I’m putting together with a list of basic elements that every government developer portal should have.

Stay tuned!

It’s Not About Cheaper, It’s About Better

The Wall Street Journal recently featured an awesome story about civic hacking, focusing on the amazing work being done in the city of Chicago.

It’s great to see the efforts of civic hackers and open data advocates covered in the mainstream press, and the team in Chicago – those both inside and outside of city government – deserve every bit of praise they get for their tremendous efforts. But I did take issue with one point made in the article – interestingly, one made in the very first sentence:

Cash-strapped cities are turning to an unusual source to improve their online services on the cheap: helpful hackers, who use city data to create tools tracking everything from real-time subway delays to where to get a free flu shot near your home and information about a contentious school-closing plan.

I don’t think anyone would argue the the fiscal pressure faced by governments – particularly cities – hasn’t helped encourage officials to experiment with new ways to provide services and information, and helped highlight the need for technology innovation. However, I don’t think its accurate to say that what is happening is an effort by governments to get IT work done by outsiders “on the cheap.”

It’s not about building things cheaper, its about building them better.

Implicit in the idea of “government as a platform” and the factors that help to drive municipal open data programs is that the role of governments in the delivery of public services is changing – toward the role of data steward or API manager, and away from the more traditional role of “app builder.”

There are a number of reasons why the role of data steward is a better one for governments – most importantly, governments don’t typically make good bets on technology. They’re not set up to do it properly, and as a result its not uncommon to see governments invest in technology that quickly becomes out of date and difficult to manage. This problem is particularly acute in relation to web-based services and applications – which outside civic technologists are very good at building – the landscape for developing these kinds of applications changes far too rapidly for governments to realistically stay current.

Governments that focus on becoming data stewards are better able to break out of the cycle of investing in technology that quickly becomes out of date. It is these governments that are moving to release open data and deploy APIs to enable outside developers to build applications that can help deliver services and information to citizens.

However, this shift to the role of data steward doesn’t deobligate governments from investing in technology or skilled staff. It simply means that this investment can be focused within a role that governments are better structured to perform well. Developing the infrastructure and policies to support an open data program and API platform are not necessarily “cheaper” but they are a much better technology investment for governments to make.

In another sense, the move to the co-production of technology solutions with outside developers is also about building better applications as opposed to building cheaper ones.

It’s common for those behind civic technology projects to have personal investment in the issues being addressed by their solutions. These developers bring a different perspective to the problems that, in the past, may only have been addressed by governments – they are stakeholders in the success of our cities.

Engaging outside developers and marshaling the efforts of civic hackers to build new tools and new services to improve our communities is about enhancing the quality of solutions, and not lowering their cost.

It’s great to see the awareness of civic hacking and the open data efforts that fuel it getting coverage in the mainstream press. But we still have a ways to go on communicating the more fundamental changes to government this movement entails, and the real benefits we all stand to gain.

This Is How It’s Supposed To Work

Openness in government strengthens our democracy, promotes the delivery of efficient and effective services to the public, and contributes to economic growth.

Federal Executive Order on Open Data, Section 1.

People in the open government community talk a lot about the potential and promise of open data. The things that it might enable. The problems it might help fix. The possibilities.

Each new instance where we see open data get used to address a problem facing a city or a community is a testament to its true power, and a validation of the work governments do to open it up and make it usable. When we see open data get put to use in the way that we envision it, it can be a very gratifying thing.

A new website in Philadelphia focused on the challenge of unused vacant land demonstrates how open data is supposed to work. It’s built using a variety of data sources made available by the City of Philadelphia, and it allows people to discover vacant land in their neighborhoods.

This isn’t the first web site to aggregate data on vacant land in Philadelphia, which underscores how pressing an issue it is for our city. One of the things I like most about the site is how it frames information about vacant land with an eye toward reuse. The site tells you the planing and zoning district a property is in, as well as the City Council district.

It tells you if there is a structure on the property by checking it against the city’s Stormwater Billing system, and if a user thinks it might be suitable to convert into a community garden there are resources available to assist.

Want to watch a specific property, or organize neighbors around it? Want to improve the data by uploading a photo, or indicating whether something is reported incorrectly? Want to purchase a property through a Sheriff Sale or through an arrangement with a private owner? The site has information and resources to assist with all of these.

The site even hints at where the City of Philadelphia should go next with it’s open data efforts. One of the data sources used by the site is an independently built API for property information. The data powering this API is actually scraped from the City’s website because it is not currently available as a data download or through a city-owned API. This is something we are currently working to change, but the fact that this site makes use of scraped data underscores the need for the City of Philadelphia to release this data in a more open format.

This site is everything that advocates of open data hope for when they work to make data more readily available and to provide documentation on how to use it. What’s most interesting about it is that not once did the sponsors or developers interact with City IT staff while building it. Data that is truly open means that users don’t need to ask for permission before they use it, or for instructions on how to use it.

Open data works best when it is readily available for those that need it, to build useful services and apps that help address the challenges facing our communities.

This is the way that open data is supposed to work.

Why Publish Open Data?

I get this question a lot, particularly from government officials who may still be skeptical about the real benefits.

And though I feel like I’ve made the open data pitch a thousand times before, working in city government for the past year has focused me on the practical aspects of this question. What are the real, practical benefits that accrue when governments release open data?

Here are three that I think are important.

First, releasing data in open formats can dramatically reduce the amount of time and effort it takes to respond to open record / FOIA requests. For some government agencies, responding to these requests takes a non-trivial amount of time – particularly if they are not done in a coordinated fashion. I’ve witnessed agencies first hand manually work through open records requests for the exact same data over and over and over. This makes no sense, especially if the data has already been deemed public and suitable for release. Publishing frequently requested data in an open format allows people to self serve, and preserves internal staff time for more pressing needs.

In addition, if your city, county or state government only maintains data publicly as part of a web document or web site there is a good chance it is being scraped. My experience is that this happens much more frequently than most government employees think. Scraping can cause undue burden on your IT infrastructure and undue stress on your IT staff that may be tasked with trying to troubleshoot issues caused by scrapers gone wild.

Second, when governments release data apps happen. We’ve seen this happen with our data releases in Philadelphia, and examples of useful and valuable apps built on open data abound. The potential for app development is greatly increased when there are standards that different governments can adopt – some good examples are GTFS and Open311, and there are developing standards around traffic data, restaurant inspections and facilities that dispense inoculations against infectious diseases.

Governments that release open data can leverage both their local developer communities and the efforts of developers elsewhere to bring useful apps to their citizens.

Finally, governments that share open data with outside consumers lay the foundation for a different, equally important, kind of sharing – sharing data across government agencies. In Philadelphia we are seeing a number of potentially valuable opportunities surface for different city departments to improve their operations by sharing data originally meant to be shared with outside developers.

Cities – big ones especially – are notoriously complex and stovepiped. In Philadelphia, the department that grants property tax exemptions is different than the one that collect property tax payments. What if we could condition the granting of exemptions on whether a property owner was current on their tax payments? Sounds simple, yet because of bureaucratic complexity it often is not. Open data can help correct this, particularly if it is structured for easy use by outside developers as an API.

Because Philly has been at this (both formally and informally) for a few years now we’re starting to identify opportunities to share data across different government entities that serve the city. We’re in discussions with our local gas utility to provide them with property data from our Office of Property Assessment so that they can verify their account information. In return, we hope to get data on utility accounts that see lots of turnover (suggesting renters moving in an out) and match it against our database of rental licences – this might be a nice revenue enhancement opportunity. The possibilities are worth spending time thinking about.

We’re in the early stages of seeing internal operational efficiencies grow out of our open data efforts, but we’re here now because we got started with releasing open data to outside users and civic hackers.

Any government that wants to start down this road will quickly start to see the benefits. They just have to get started.

Keeping the Faith on Open Data

A few weeks ago at Personal Democracy Forum, I had the pleasure of speaking on a panel discussing “Do’s and Don’ts” for civic hackers.


The makeup of the panel was fantastic, and included smart people like Tom Steinberg from MySociety, Catherine Bracy from Code for America, Erie Myer from the Consumer Financial Protection Bureau and former Presidential Innovation Fellow Phil Ashlock. All in all, we had a great panel, and a very good discussion about civic hacking and open data – but something that one of my fellow panelists said has stuck with me.

Tom Steinberg remarked during our discussion that the open data movement, which started out focused primarily on increasing government transparency and opening the kinds of data sets that governments might be reluctant to release to the public, has come to focus increasingly on the release of operational data from government – things like bus schedules, parking meter locations, library hours and 311 service request details.

This same issue came up not long ago during a discussion of local government open data efforts at Transparency Camp in Washington, DC, and is often brought up in the context of the Open Data vs. Open Government debate.

I’ve said before, and I still believe, that in order to support open government we need to have the infrastructure and policies in place to support open data programs. Once we have the technology and policy mechanisms in place to support open data, someone needs to make sure that they are being used (at least in part) to provide greater transparency into how well our public officials are doing their jobs.

As someone tasked with the responsibility of helping to put those technology and policy mechanisms in place, I can say firsthand that there are some qualities of operational data that make these kinds of data releases more popular with both data producers (governments) and data consumers.

Civic Hacking in Baltimore

Operational data from governments is what is most often turned into apps. This is usually the data that makes hackathons work, and it is fairly easy to demonstrate to a government data producer the value of releasing this data. When we talk about making the “business case” for open data, we are most likely referring to operational data releases.

We can more easily tie this kind of data back to a specific objective or goal of a government agency. We can relate it to economic development efforts. While far from perfect, we can begin to quantify the impact of this kind of data release.

With data that is more purely about transparency, these things can be much harder to do. And for many of the actors in the open data ecosystem – particularly governments – this makes these kinds of data releases much less appealing. This means that transparency data releases can take longer to realize and require much more effort to achieve.

There are some notable examples of transparency data releases in the City of Philadelphia that underscore this point. First, the release of data providing details on complaints against Philadelphia police officers. This data set provides details never before released from the city about the specific nature of complaints against officers and details (including the location) of those making complaints. While far from complete, this data set provides some new insight into an issue that most police departments are reluctant to share publicly.

Another good example is a data set showing the geographic market areas used to conduct a citywide reassessment of taxable properties in Philadelphia. This data set provides insight into the methodology used by the city to conduct the property reassessment and in effect allows those outside city government to inspect the quality of the work done by the Office of Property Assessment.

In terms of sheer numbers, these two data releases are small when compared to the many other data sets that have been released by the City of Philadelphia over the past year. Yet both of these data releases required relatively more effort to get done – the relatively smaller number of transparency data releases belies their value.

There is a temptation in the open data world to evaluate the relative success of a government’s open data program based on the volume of data releases. I don’t think we have good metrics yet to capture how well (or not well) open data programs are doing on achieving more fundamental government transparency. I think we need them.

There is a strong case to be made that much of the operational data of governments is valuable and is appropriately released through open data programs. In some cases, this data allows outsiders to evaluate the job that government is doing – how close to the published schedule are the trains running? How long are 311 service requests open, and in which neighborhoods, before they are resolved?

But I also believe that government officials that are tasked with putting in place the infrastructure necessary for governments to efficiently share data need to be mindful of their duty to enhance government transparency. We need to keep the faith on open data, and stay true to the same principles that helped initiate the movement.

Chief Data Officers and similar public officials carry the dual responsibility of having to release data that helps make government work better, and helps make democracy work better.

On Data Standards for Cities

Creating open data standards for cities is really, really hard. It’s also really, really important.

Data standardization across cities is a critical milestones that must be realized to advance the open data movement, to fully realize all of the potential benefits of openly publishing government data. More and more people are starting to realize the importance of this milestone and more and more energy will be devoted to creating new standards for city data in the months and years ahead.


The best example of what is possible when governments publish open data that conforms to a specific standard is the General Transit Feed Specification (GTFS). Developed by Google in partnership with the Tri-County Metropolitan Transportation District of Oregon (TriMet), GTFS is a data specification that is used by dozens of transit and transportation authorities across the country, and it has all of the qualities that open data advocates hope to replicate in other data standards for cities.

Transit authorities that publish GTFS data see an immediate tangible benefit because their transit information is available in Google Transit. Making this information more widely available benefits both transit agencies and transit riders, but the immediacy with which transit agencies can see this benefit make GTFS particularly valuable. Data standardization is an easier sell to government officials when tangible benefits are quickly realized.

The GTFS standard is relatively easy to use – it’s a collection of zipped, comma-delimited text files. This is a pretty low bar for transit agencies being asked to produce GTFS data, and it’s an eminently usable format for consumers of GTFS data. In fact, the ease of use of GTFS has spawned a cottage industry of transit applications in cities across the country and continues to be used as the bedrock set of information for transit app developers.

And perhaps most importantly, GTFS has given open data advocates a benchmark to use to advance other data standardization efforts. In many ways, GTFS made standards like Open311 possible.

So if data standardization is the future, and we’ve got at least one really good example to demonstrate the benefits to stakeholders and advance the concept, then what’s next? What’s the next data standard that will be adopted by multiple governments?

For the past year or so, there has been widespread interest in developing a shared data standard for food safety inspection data. On it’s face, this seems like a good candidate data source to standardize across cities. Most cities (certainly all large cities) conduct regular inspections of establishments that serve food to the public. This information can be (but is not always) fairly succinct – usually a letter grade or numerical ranking – that can easily be delivered to an end user on a number of different platforms and channels. For many reasons, focusing on food safety inspections data as the next best data set to standardize across cities makes a lot of sense.

Just recently, the joint efforts of several different groups culminated in an announcement by the City of San Francisco and Yelp to deliver standardized food safety inspection data through the Yelp platform.

I was involved in the discussions about a data standard for food safety inspections, though the City I work for will not be adopting the newly developed standard (at least not yet). The process of developing the new food safety inspections data standard was illuminating. There are some important lessons we can take away from this work – lessons we can put to use as we work to identify additional municipal data sets for standardization.

For me, the biggest lesson learned from the work that went into standardizing food safety inspection data is understanding when applying a data standard might obscure important differences in how data is collected, or in what data means. By way of example, a data standard like GTFS does not obscure differences in the underlying data across different jurisdictions. A transit schedule broken down to its essence is about location and time – when will my bus be at a specific stop on a specific route. There is nothing inherently different about this information from jurisdiction to jurisdiction. Time and place mean the same thing everywhere.

But this is not always the case with food safety inspection data – particularly when this data is distilled into digestible (pun intended) scores or rankings. The methods for conducting food safety inspections from city to city can vary widely, and these differences can result in very different results depending on where it comes from.

Daniel E. Ho, a professor at Stanford University, conducted an in depth study of the restaurant inspection systems in New York City and San Diego and found that the way in which inspection regimes are implemented can result in data that is often very different when compared across cities.

“While San Diego, for example, has a single violation for vermin, New York records separate violations for evidence of rats or live rats; evidence of mice or live mice; live roaches; and flies — each scored at 5, 6, 7, 8 or 28 points, depending on the evidence. Thirty ‘fresh mice droppings in one area’ result in 6 points, but 31 droppings result in 7 points.”

There also appears to be some debate in the medical community about the effectiveness of simplified grading for food establishments – i.e., using a letter grade or a numerical score. As noted in Professor Ho’s report – “…a single indicator has not been developed that summarizes all the relevant factors into one measure of [food] safety.”

All that said, if we’re going to advance the work of creating data standards across cities we need to identify the right data sets to standardize. These candidate data sets should have the same qualities as GTFS – demonstrating immediate benefits to data producers and data users, ease of use – but not have some of the less desirable qualities of food safety inspection data – obscuring differences in data collection and data quality across jurisdictions.

Lately, I’ve been trying to advance the idea that data about the locations where flu shots are administered (or any other form of inoculation) could be standardized across cities. I’ve gotten some great input from data advocates and from other cities, like the cities of Chicago and Baltimore.

I’m hoping to continue pushing this idea in the months ahead, leading up to the next flu season. If this most recent flu season has shown us anything, it’s that data matters – I think there could be enormous benefit in having cities use a standard data format for this information before the onset of the next really bad flu season.

But whether it’s flu shot locations or some other data set, the future of open data lies in building standards that multiple cities and government can adhere to. This is the next great milestone in the open data movement.

Advancing the movement toward this goal will be the most important work of the open data community in the months and years ahead.

[Note – photo courtesy of the San Diego International Airport]

Open Data and the Digital Divide

I had the pleasure recently of taking part in a series on WHYY’s Radio Times focusing on Philadelphia Innovators.

I got a chance to talk about what the City of Philadelphia is doing to release more open data to technologist, entrepreneurs and researchers in an effort to spur innovation.

Host Maiken Scott led a great discussion that also included Maria Walker from the Freedom Rings Partnership, and Juliana Reyes from Technically Philly.