For the NICAR 2016 conference in Denver, Census Reporter's Joe Germuska participated in a panel, The way we look next: Mining past and future census data to predict diversity in race, income and aging.
Joe's section was on Using Indices for Complex Comparison (slides).
Much of the talk was intended to demonstrate how humans can use Census Reporter to compute the USA Today Diversity index easily. The slides include a walkthrough of using Microsoft Excel to do the work. This repository shows how you can do it with Python.
Below is some of what is also in the slideshow, besides the Excel demo.
- Created by Phil Meyer (UNC) and Shawn McIntosh (USAT) in 1991
- Reviewed in 2001 after Census changed race question. Equation essentially unchanged.
Diversity = 1-((WhitePct2 + BlackPct2 + AmericanIndianPct2 + AsianPct2 + NativeHawaiianPct2) * (HispanicPct2 + NonHispanicPct2 ))
The relevant data for these is in two tables
Census Reporter makes it easy to get Census Data for a variety of geographies. The links above take you to an overview page for each table, where you can enter the geography or geographies which you're analyzing.
If you work on this stuff for a while, you'll become familiar with the key geoids (geographic identifiers) for the places you study most. You can learn some more about geoids and other aspects of how the Census Bureau deals with geography on Census Reporter's geography help page.
Another thing, if you work on this stuff for a while, is that you might just start "hacking" the URLs instead of going through the Census Reporter GUI. For example, say you've gotten the page for B02001 for all states in the US:
http://censusreporter.org/data/table/?table=B02001&geo_ids=040|01000US&primary_geo_id=01000US
To go from that page to B03003 - Hispanic or Latino Origin, you just have to change the value for table:
http://censusreporter.org/data/table/?table=B03003&geo_ids=040|01000US&primary_geo_id=01000US
You can download the data from those pages in many formats, including Excel. Note that the Excel files do not include verbal labels for columns, because some of the column names are extremely long, or are nested so that making the individual column names clear would be clumsy. The full information about column names is included in a JSON
file which is part of the download, but for your reference, these are the relevant columns for computing the diversity index:
-
B02001002 White alone
-
B02001003 Black or African American alone
-
B02001004 American Indian and Alaska Native alone
-
B02001005 Asian alone
-
B02001006 Native Hawaiian and Other Pacific Islander alone
-
B03003002 Not Hispanic or Latino
-
B03003003 Hispanic or Latino
Since you need to compute percentages, note that, as is standard, the first column in each table is the "total" column, or the denominator. The values should be the same, because the universe for both tables is the same. If somehow you find that they aren't, you've somehow gathered data from two different ACS releases. (That's very unlikely if you get data from Census Reporter for the same places at the same time.)
- B02001001 Total population
- B03003001 Total population
A strictly verbal explanation of the process with Excel is tedious. See the slides for an illustrated walkthrough.