The Freedom of Information Act, passed in 1966 to increase trust in government by encouraging transparency, has always been a pain in the ass. You write to an uncaring bureaucracy, you wait for months or years only to be denied or redacted into oblivion, and even if you do get lucky and extract some useful information, the world has already moved on to other topics. But for more and more people in the past few years, FOIA is becoming worth the trouble.
— The Secret to Getting Top Secret Secrets, by Jason Fagone.
I’ve always thought that the FOIA process was an important part of a healthy open data program. That may seem like an obvious thing to say, but there are lot of people involved in the open data movement who either have limited exposure to FOIA or just enough exposure to truly to loath it.
In addition, the people inside government who are responsible for responding to FOIA requests may have very different feelings about releasing data than those that are part of an open data program.
There are lots of reasons why, for advocates of open data, the FOIA process is suboptimal. A number of them are discussed in a recent blog post by Chris Whong, an open data advocate in New York City and a co-captain of the NYC Code for America Brigade, who FOIA’d the NYC Taxi & Limousine Commission for bulk taxi trip data.
Chris’ post details many of things that open data advocates dislike about the FOIA process. It’s an interesting read, especially if you don’t know how the FOIA process works.
However, another more serious shortcoming of the FOIA process became obvious almost immediately after the taxi trip data was posted for wider use. It turns out that the Taxi & Limousine Commission had not done a sufficient job depersonalizing the data, and the encryption method used to obscure the license number of taxi drivers and their medallion number was easy to circumvent with moderate effort.
It’s obvious that the Taxi Commission tried to obscure this personal data in the files it released and to also make sure the data was as usable as possible by the person who requested it. Striking this balance can be tricky, and it’s actually not uncommon for data released through FOIA requests to have information that may be viewed as sensitive in hindsight.
I think one of the reasons this happens with data released through FOIA is that the process is not usually coupled tightly enough with the open data review process. I think we can make FOIA better (and, by extension, make the open data process better) by running more FOIA requests through the vetting and review process used to release open data.
Outcome vs. Process
In my experience, there is often very little connection between the process for responding to FOIA requests and the open data release process. Beyond reviewing FOIA requests in the aggregate to see if there are opportunities for bulk data releases, the FOIA process and the open data release process often happen independently of one another. This is certainly the case in the City of Philadelphia.
In Philly, open data releases are coordinated by the Chief Data Officer in the Office of Innovation and Technology. FOIA requests – or Right to Know Requests as they are known in the Commonwealth of Pennsylvania – are handled by staff in the Law Department, or personnel that have been identified as Right to Know Officers for their specific department.
These requests almost always get treated as one-off tasks, never to be repeated again. Even though requests for the same data may be made at a later date, I’ve never seen the people working on FOIA requests in Philly take the approach of making their work to respond to these requests repeatable.
The problem with a bifurcated approach to data releases like this is that it forces people to think of the work to respond to FOIA requests as disposable. Something that happens once – an outcome, instead of a process. Open data done correctly is about establishing a process – one that includes opportunities for review and feedback.
Toward Better FOIA Releases
Because FOIA is viewed as a one and done task, there is no opportunity to iteratively release data – if the release of NYC taxi trip data had been viewed as a process (particularly a collaborative one), the Taxi & Limousine Commission could have opted to be conservative in their initial release and then enhanced future releases based on actual feedback from real consumers of the data.
In Philadelphia, we employed a group called the Open Data Working Group to help review and vet proposed data releases. This is an interdisciplinary group from across different city departments which helped provide feedback and input on a number of important data releases that required depersonalization or redaction of of sensitive data – crime incidents, complaints filed against active duty police officers, etc.
Additionally, part of our release process involved reaching out to select outside data consumers to get feedback and help identify issues prior to broader release. Because we used GitHub for many of our data releases, we could set up private repos for our planned data releases and ask selected experts to help us vet and review by adding them as collaborators prior to making these data repos public.
Getting to Alignment
I think for a lot of amateurs, their alignment is always out.
— Karrie Webb, professional golfer
When it comes to data releases, there is no substitute for experience – that’s why integrating FOIA releases into an existing open data release process can be so beneficial. Leveraging the process for reviewing open data releases can improve the quality of FOIA releases and bring these two critical elements of the open data process into closer alignment.
I’m hopeful that cities, particularly Philadelphia, will begin to see the merit of better aligning FOIA responses and open data releases.