This document describes how to install the Machine code for local development, and demonstrates two ways to use it: running a single source and running a complete batch set. If you’re editing a lot of sources and want to do it quickly without waiting for a remote Github-based continuous integration service, you may want to use run single sources locally. If you're working on the queuing and job control portions of Machine code, you may want to run complete batch sets on test data.
Run a single source without installing Python or other packages locally using OpenAddresses from Docker Hub.
-
Get the latest OpenAddresses image from Docker Hub:
docker pull openaddr/machine
-
Download a source from OpenAdresses/openaddresses on Github. Berkeley, California is a small, reliable source that’s good to test with:
curl -o us-ca-berkeley.json \ -L https://github.com/openaddresses/openaddresses/raw/master/sources/us/ca/berkeley.json
-
Using Docker, run
openaddr-process-one
to process the source:docker run --volume `pwd`:/vol openaddr/machine \ openaddr-process-one -v vol/us-ca-berkeley.json vol
-
Look in the directory
us-ca-berkeley
for address output, logs, and other files.
You can edit a local copy of OpenAddresses code with working tests by installing everything onto a local virtual machine using Docker. This process should take 5-10 minutes depending on download speed.
-
Download and install Docker. On Mac OS X, use Docker for Mac. On Ubuntu, run
apt-get install docker.io
or follow Docker’s own directions. -
Build the required image, which includes binary packages like GDAL and Postgres.
docker build -f Dockerfile-machine -t openaddr/machine:`cut -f1 -d. openaddr/VERSION` .
-
Run everything in detached mode:
docker-compose up -d
Run
docker ps -a
to see output like this:IMAGE STATUS NAMES ... openaddr/machine ... Exited (0) 44 seconds ago ... openaddressesmachine_machine_1 mdillon/postgis Up 45 seconds openaddressesmachine_postgres_1
-
Connect to the OpenAddresses image
openaddr/machine
with a bash shell and the current working directory mapped to/vol
:docker-compose run machine bash
-
Build the OpenAddresses packages using virtualenv and pip. The
-e
flag topip install
ensures that your local copy of OpenAddresses is used, so that you can test changes to the code made in your own editor:pip3 install virtualenv virtualenv -p python3 --system-site-packages venv source venv/bin/activate pip3 install -e file:///vol
You should now be able to make changes and test them. If you exit the Docker container, changes made in step 5 above will be lost. Use Docker commit or similar if you need to save them.
Run unit tests:
python3 /vol/test.py
Run a batch set of address data to populate machine
with sample data. These instruction show how to run a set of small-scale testing
data from the repository openaddresses/minimal-test-sources
.
This process should take less than 10 minutes.
-
After preparing a virtual machine and running tests, a new local
openaddr
Postgres database will exist with this connection string:postgres://openaddr:openaddr@localhost/openaddr
Three other pieces of information are needed:
- An empty Amazon S3 bucket in the US Standard region to store data.
- Amazon Web Services credentials stored where
boto
can find them. - A personal access token to access Github’s API.
-
In a terminal window, run
openaddr-enqueue-sources
with the information above and leave it open and running:openaddr-enqueue-sources --verbose \ --owner openaddresses --repository minimal-test-sources \ --database-url {Connection String} \ --github-token {Github Token} \ --bucket {Amazon S3 Bucket Name}
-
In a second terminal window, run a single worker to processed the queued sources one after another, then run the dequeuer to pass them back. Note both of these programs do not exit, they merely block waiting for work. You can manually abort them with
Ctrl-C
once the work is completed.openaddr-ci-worker --verbose \ --database-url {Connection String} \ --bucket {Amazon S3 Bucket Name} env DATABASE_URL={Connection String} \ GITHUB_TOKEN={Github Token} \ openaddr-ci-run-dequeue
-
Back in the first terminal window, you should have seen
openaddr-enqueue-sources
complete and exit. You can now run the Webhooks web application and leave it running to see the results of the batch set in a web browser:env DATABASE_URL={Connection String} \ GITHUB_TOKEN={Github Token} \ AWS_S3_BUCKET={Amazon S3 Bucket Name} \ python3 run-debug-webhooks.py
-
In the second terminal window, try collecting address data into downloadable archives:
openaddr-collect-extracts --verbose \ --owner openaddresses --repository minimal-test-sources \ --database-url {Connection String} \ --bucket {Amazon S3 Bucket Name}