For now, idle exploration around making it easier to use the pandas library to analyze Census data.
A while ago, Hunter Owens asked if we knew about anyone using the pandas data analysis package with the Census Reporter API. I whipped up some example code in a gist and went on with things.
Recently I started fooling around with it a little more and decided to put it on Github in case anyone else was interested. For a brief moment I considered trying to port Ezra Glenn's acs.R package, but I quickly realized that that is an enormous accomplishment and honestly, I don't do enough data analysis on a routine basis to be motivated.
For now, it uses the Census Reporter API for data, but it might make sense to use the official Census API, since right now CR only has one year worth of data.
For now, there's really one method, get_dataframe
. Here's how it works:
get_dataframe(tables='B01003',geoids='040|01000US',col_names=True,geo_names=True,include_moe=True)
df.head()
name | Total | B01003001_moe | |
---|---|---|---|
04000US01 | Alabama | 4833722 | 0 |
04000US02 | Alaska | 735132 | 0 |
04000US04 | Arizona | 6626624 | 0 |
04000US05 | Arkansas | 2959373 | 0 |
04000US06 | California | 38332521 | 0 |
As the syntax suggests, you can pass multiple tables: you really should use an array in that case, but if you pass a string, it adapts.
The same goes for geoids: pass a string or an array of strings. As the example demonstrates, you can select a group of related geographies using Census Reporter's syntax of sumlev|container-geoid
.
I'm open to input and pull requests. Who knows where this will go.