This repository houses a scraping engine for the UCPD's Incident Report webpage. The data is stored on Google Cloud Platform's Datastore and ran using Heroku's Dyno functionality.
- Scrape the UCPD Incident Report webpage every weekday morning, pulling all incidents from the latest reported incident date in the Google Datastore to the current day.
- Upload all stored UCPD incidents to the Chicago Maroon's Google Drive every Saturday morning.
- Ethical Issues of Crime Mapping: Link
I'd like to thank @kdumais111 and @FedericoDM for their incredible help in getting the scraping architecture in place. As well as @ehabich for adding a bit of testing validation to the project. Thanks, y'all! <3
- Python version:
^3.11
- Poetry
- Census API Key stored in the environment variable:
CENSUS_API_KEY
- Google Cloud Platform service account with location of the
service_account.json
file stored in the environment variable:GOOGLE_APPLICATION_CREDENTIALS
- Google Cloud Platform project ID stored in the environment variable:
GOOGLE_CLOUD_PROJECT
- Google Maps API key stored in the environment variable:
GOOGLE_MAPS_API_KEY
- Google Drive Folder ID stored in the environment variable:
GOOGLE_DRIVE_FOLDER_ID
- Any modules should be added via the
poetry add [module]
command.- Example:
poetry add black
- Example:
make lint
: Runspre-commit
on the codebase.make seed
: Save incidents starting from January 1st of 2011 and continuing until today.make update
: Save incidents starting from the most recently saved incident until today.make build-model
: Build a predictive XGBoost model based off of locally saved incident data and save it in thedata
folder.make categorize
: Categorize stored, 'Information' labeled incidents using the locally saved predictive model.make download
: Download all incidents into a locally stored file titledincident_dump.csv
.