Parallelize precinct matching by sharding data into counties.
-
Run all the bash scripts in
bash_scripts/
bash bash_scripts/setup-data-dirs.sh
bash bash_scripts/setup-python-env.sh
-
Pivot and validate precinct level election results in
pivot_results.ipynb
-
Aquire and validate precinct level election shapefile.
-
Load and inspect the precinct election resutls and precinct eleciton shapefile in
setup.ipynb
. Ensure that there textual data for matching or this framework will not be very effective. That is, the files should have varied forms of the same name (e.g. LEHIGH TWP DIST PENNSVILLE matches LEHIGH District PENN) for most precincts. -
Next, the work can be divided among team members by assigning counties. Matching for each could should be done in the
matching/{$county-id}
folder. It would be helpful to have a spreadsheet of some kind to identify bottle necks and generally track progress. -
Once all counties have been matched, run
merge.ipynb
to combine the exported county shapefiles into a statewide precinct level election shapefile.
-
Run
bash bash_scripts/setup-python-env.sh
to initialize your virtual environment and launch JupyterLab -
Navigate to
matching/{$county-id}/{$county-id} Precinct Matching.ipynb
where{$county-id}
is the county that you want to work on. -
Click
{$county-id} Precinct Matching.ipynb
to launch the notebook. -
Select
venv (precinct-matching)
as your kernel (top right in the UI). This will ensure you are using an environment with all the requisite dependencies. -
Use
Shift+Enter
to run the cells until you reach the last cell. Review to output of that cell to see the current state of your matches. Adjust that code as necessary to make more matches.