Civic Innovations

Technology, Government Innovation, and Open Data


5 Essential Open Data Tools

Every data wrangler has their own list of favorites – the go to tools that they use when they need to work with data.

If you need to clean, transform, or mashup data or if you are working with a data set that will form the basis for an application, here is a list of tools that can make life easier for you.

  • OpenRefine – I don’t think there is a better tool for cleaning messy data than OpenRefine. One of my favorite features is the ability to add new columns to a data set based on data in an external web service.
  • jq – I see a lot of JSON in my job, and its exceptionally easy to use JSON data with a tool like this one. For example, here is a simple jq recipe for extracting a list of licensed pawn shops in Philadelphia to a CSV file.
  • csvkit – CSV is another format I see almost everyday, and using csvkit makes it simple. My favorite utility – though I don’t use it often – is csvsql. use this handy utility to generate SQL insert statements and easily create a relational database from a CSV file.
  • Unix shell – jq and csvkit are both command line tools, and the Unix shell is the place where I spend a lot of time working with data. Without getting into a Windows vs. *nix war, there is simply no better collection of utilities for working with text files than those that can accessed via the shell. Tools like curl, grep, sed, awk, cut and a host of others are enormously useful on their own, or in combination with tools like jq and csvkit.
  • CartoDB – pretty much the easiest way to create a web-based map from an open data set. There’s even an API for building apps on top of the data you have in your CartoDB account. Enough said.

Note, my background is in software development so the list of favorites above probably reflects my own professional biases. Someone who works primarily as a data scientist might have a completely different list of favorite tools.

What’s your favorite tool for working with data?

3 responses to “5 Essential Open Data Tools”

  1. Not sure how I could have forgotten about ScraperWiki.

  2. Newest kid on the web scraper block is kimonolabs.com – Check out the jaw dropping video.
    Also jsonlint.com for quick JSON validation

  3. I use vim, Ruby, and D3.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

About Me

I am the former Chief Data Officer for the City of Philadelphia. I also served as Director of Government Relations at Code for America, and as Director of the State of Delaware’s Government Information Center. For about six years, I served in the General Services Administration’s Technology Transformation Services (TTS), and helped pioneer their work with state and local governments. I also led platform evangelism efforts for TTS’ cloud platform, which supports over 30 critical federal agency systems.

%d bloggers like this: