Data sources mirror

PDAP metadata made easier to access and work with, beginning with data sources and agencies.

What this is

This mirrors our Data Sources what's that? from Airtable to our PostgreSQL database. Airtable is the entry point for our data, but using Postgres allows us to make an API that's decoupled from Airtable.

This script runs daily from the Automation Manager, using the Dockerfile and Jenkinsfile to pull the latest changes, install any requirements, and run the script from a container environment.

Installation

1. Clone this repository and navigate to the root directory.

git clone https://github.com/Police-Data-Accessibility-Project/data-sources-mirror.git
cd data-sources-mirror

2. Create a virtual environment.

If you don't already have virtualenv, install the package:


pip install virtualenv

Then run the following command to create a virtual environment:


virtualenv -p python3.9 mirror_venv

3. Activate the virtual environment.


source mirror_venv/bin/activate

4. Install dependencies.


pip install -r requirements.txt

5. Create a file named .env in the same directory containing secrets for DO_DATABASE_URL, AIRTABLE_BASE_ID, and AIRTABLE_TOKEN

The app should have a DO_DATABASE_URL, AIRTABLE_BASE_ID, and AIRTABLE_TOKEN for PDAP's Data Sources DigitalOcean. Reach out to [email protected] or make noise in Discord if you'd like access to these keys.

DO_DATABASE_URL=postgres://data_sources_app:<password>@db-postgresql-nyc3-38355-do-user-8463429-0.c.db.ondigitalocean.com:25060/defaultdb
AIRTABLE_TOKEN=<airtable_token>
AIRTABLE_BASE_ID=<airtable_base_id>

6. Run the mirror script.


python3 mirror.py

Update frequency

Hourly

How it works

We're taking advantage of a Python wrapper around the Airtable API to provide flat data joinable on an agency's airtable_uid. The script is scheduled to run via crontab on the Automation Manager droplet in DigitalOcean, on an hourly basis. On each run, the latest code from the main branch of the repo is fetched and current requirements are installed in order to make sure the script is up to date on the droplet. During execution, only rows from each table that are new or have been updated in the last two hours are fetched to update our Postgres database.

The intent is to do as little transformation as we can, erring on the side of being true to what the API wrapper returns, while also being transparent about how we're doing things.

Some caveats:

As with all data, this is human-entered and human-devised and subject to imperfections. Corrections and suggestions and questions welcome.
We'd love to have more fields filled out, or new records added, related to any agencies or data sources.

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
.github		.github
csv		csv
json		json
.gitignore		.gitignore
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
README.md		README.md
airtable_logic.py		airtable_logic.py
data_transfer_objects.py		data_transfer_objects.py
database_logic.py		database_logic.py
mirror.py		mirror.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data sources mirror

What this is

Installation

1. Clone this repository and navigate to the root directory.

2. Create a virtual environment.

3. Activate the virtual environment.

4. Install dependencies.

5. Create a file named .env in the same directory containing secrets for DO_DATABASE_URL, AIRTABLE_BASE_ID, and AIRTABLE_TOKEN

6. Run the mirror script.

Update frequency

How it works

About

Releases

Packages

Contributors 7

Languages

Police-Data-Accessibility-Project/data-sources-mirror

Folders and files

Latest commit

History

Repository files navigation

Data sources mirror

What this is

Installation

1. Clone this repository and navigate to the root directory.

2. Create a virtual environment.

3. Activate the virtual environment.

4. Install dependencies.

5. Create a file named .env in the same directory containing secrets for DO_DATABASE_URL, AIRTABLE_BASE_ID, and AIRTABLE_TOKEN

6. Run the mirror script.

Update frequency

How it works

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages