Old tools & new challenges for governments
There is a common misconception that data-driven decision making and the use of complex algorithms are a relatively recent phenomenon in the public sector. In fact, making use of (relatively) large data sets and complex algorithms has been fairly common in government for at least the past few decades.
As we begin constructing ethical frameworks for how data and algorithms are used, it is important that we understand how governments have traditionally employed these tools. By doing so, we can more fully understand the challenges governments face when using larger data sets and more sophisticated algorithms and design ethical and governance frameworks accordingly.
Image courtesy of Flickr user antonymayfield. View license here.
I’ve had the opportunity recently to talk to people in several different city governments that are facing a common challenge — how to liberate operational data from a legacy system.
This is a challenge that lots of city governments face, and it strikes me that there are some common lessons that can be derived from cities that have gone down this road already for those that are still trying to figure out the right approach.
The following suggestions are crafted from my own experience as a municipal government official charged with making data more widely available, and those of people in similar positions that I’ve had a chance to speak with.
Flickr image courtesy of Flickr user bitterbuick
We live in a time when people outside of government have better tools to build things with and extract insights from government data than governments themselves.
These tools are more plentiful, more powerful, more flexible, and less expensive than pretty much everything government employees currently have at their disposal. Governments may have exiting relationships with huge tech companies like Microsoft, IBM, Esri and others that have an array of different data tools — it doesn’t really matter.
In the race for better data tools, the general public isn‘t just beating out the public sector, its already won the race and is taking a Jenner-esque victory lap.
This isn’t a new trend.
Photo courtesy of Flickr user Oliver Hine.
Last August, a study from the Century Foundation identified cities in Upstate New York as places with some of the highest concentrations of poverty for African American and Hispanic populations anywhere in the nation. The problem is particularly acute in the City of Syracuse which holds the distinction of having the highest level of poverty concentration among African American and Hispanic populations of the one hundred largest metropolitan areas in the U.S.
This problem isn’t Syracuse’s alone – the study shows that Rochester and Buffalo also have serious problems with concentrated poverty. But the Salt City is an unfortunate standout in this report. In addition to have the highest concentrations of poverty among African Americans and Hispanics, when looking at concentrated poverty among non-Hispanic whites “…Detroit, Fresno, and Syracuse are the only metropolitan areas on all three lists.”
The Century Foundation’s findings echo those of an earlier study with a similar scope conducted by CNY Fair Housing, Inc. which found that the Syracuse area is “one of the worst scoring cities in the country when looking at equality of opportunity based on race and ethnicity.” Given what we know about how concentrated poverty affects the life outcomes for people who live in it, it’s hard to imagine a more serious drag on the growth and well being of our region than deliberately forcing people to live in places where they are surrounded by poverty and given them few options of getting out.
But that’s exactly what we do.
Image Courtesy of Flickr user Robert Couse-Baker
Getting a speeding ticket in the State of New York can be a traumatic – and expensive – experience.
Drivers convicted of speeding often face penalties and fines, and repeated or excessive offenses can result in the loss of a license. But in some places in New York State, drivers issued a speeding ticket may see a very different outcome than one would typically expect: a smaller fine, the avoidance of long-term penalties, and – oddly enough – additional money in local government coffers.
Last year, I wrote a post detailing some data science techniques that used a data set from the State of New York on traffic violations. This is an incredibly rich data set and it’s awesome that the state makes this available as open data. However, the issuance of traffic tickets only tells part of a larger story – since the release of this initial data set, the state has now released data on traffic ticket convictions. Comparing both of these data sets allows us to see the full picture of what happens when a traffic ticket is issued in the State of New York, from issuance all the way through to adjudication.
An examination of these data sets show that – in certain areas of the state – tickets issued for speeding are much more likely to be negotiated down to a lessor offense that allows drivers to avoid a penalty from the Department of Motor Vehicles (DMV), and actually provides a backdoor benefit for local governments.
When it comes to deriving useful results about the operation of government from open data sets, we have an enormous array of tools at our disposal that we can make use of. Often, we do not need sophisticated or expensive tools to produce useful results.
In this post, I want to use command line tools that are available on most laptops, and others that can be downloaded for free, to derive meaningful insights from a real government open data set. The following examples will leverage *nix-based tools like
sed as well as open source tools that can be invoked from the command line like
The data used in this post is from the NY State Open Data Portal for traffic tickets issued in New York State from 2008 – 2012.