An API for the Planetary Data System Small Bodies Node survey data search tool: catch comets and asteroids in wide-field sky survey data.
CATCH-APIs provide a REST API service that enable a user to search for potential observations of comets and asteroids in wide-field sky survey data. CATCH can also be used to search for any fixed position on the sky. This API is designed for use by the Planetary Data System Small Bodies Node (SBN) at the University of Maryland, but it is possible to deploy with modifications anywhere with other data sets. SBN is the primary archive for the Near-Earth Asteroid Tracking (NEAT) survey, the Asteroid Terrestrial-impact Last Alert System (ATLAS), the Catalina Sky Survey (CSS), Spacewatch, and the Lowell Observatory Near-Earth-Asteroid Search (LONEOS). CATCH-APIs is one of the primary methods for users to discover scientifically interesting data in those data sets.
The API uses the following:
- The catch and
sbsearch Python libraries,
which define how survey metadata is stored and execute the actual searches on
the database.
- The s2geometry C++ library is used for spatial indexing on the Celestial Sphere.
- Target ephemerides are generated by Horizons at NASA JPL via astroquery.
- SQLAlchemy and PostgreSQL store user queries and survey metadata.
- Metadata ingestion uses pds4_tools.
- flask, connexion, openapi, and gunicorn for the API experience and documentation.
- redis for the job queue and user task messaging.
- docker is used for cross-platform development and deployment
- A single observation table holds all observations from all surveys. This feature allows for an approximate search that can ignore parallax and search all surveys in one go. Fine for objects a few au a way or more.
- The S2 library cells range from 3 arcmin to 1 deg.
- Ephemerides may create loops on the sky (e.g., during retrograde motion), this is correctly handled by S2, even when padding the ephemeris with some uncertainty.
- Ephemerides are split segments, and the database is queried for each. This improves performance because the points in each segment are spatially correlated.
These APIs wrap the functionality given by the catch and sbsearch libraries.
-
A user submits a query for a comet or asteroid, e.g. '65P', to the
/catch
route; e.g./catch?target=65P&sources=neat_palomar_tricam&uncertainty_ellipse=true&padding=0&cached=true
. -
If the object has been found previously, and if the query option
'cached'
is set to true, then the response will indicate that you can immediately access the scientific data by passing the stated'job_id'
to the/caught/:job_id
route; e.g., the URL in the'results'
field:{ "job_id": "c5746cfca71340419060560d95e4a1e3", "message": "Enqueued search. Listen to task messaging stream until job completed, then retrieve data from results URL.", "message_stream": "http://catch-api.astro.umd.edu/stream", "query": { "cached": false, "padding": 0, "sources": [ "atlas_haleakela" ], "start_date": null, "stop_date": null, "target": "2024 YR4", "type": "ASTEROID", "uncertainty_ellipse": false }, "queued": true, "results": "http://catch-api.astro.umd.edu/caught/c5746cfca71340419060560d95e4a1e3", "version": "3.0.0" }
-
If the object has not been previously found, or had the user set
'cached'
to false, then a new job will be posted to the CATCH APIs queue (implemented with redis). The response to the user will havequeued = true
.- A separate worker process will claim that job and execute the query with the underlying catch and sbsearch libraries to generate the scientific data for that comet or asteroid.
- During this job, the worker will post status messages back to the redis
queue. These messages are available to the user via the
/stream
route (see'message_stream'
field in the above JSON response). See the user messaging stream section below for more details. - Once complete, the worker will save the results to the database and
publish a
"status": "success"
message to the message stream. Because this will take an unknown amount of time to complete, one can subscribe to the server-sent-event route at/stream
, and monitor the status messages labeled with their job prefix.
Note: there are two types of "worker" to think about in this code base. There are the gunicorn workers that handle the HTTP requests within the "apis" service, and there are the workers that accept tasks from the redis queue and carry out the computationally expensive workload. We try wherever possible to label this latter kind of redis-queue ('RQ') workers as "woRQer".
CATCH searches generally take more than a few seconds to complete. After a
search is enqueued, the user is notified of the progress via the /stream
route.
The /stream
route implements messaging using server sent events (SSE).
The messages sent by CATCH APIs are JSON-formatted text, e.g.,:
data: {"job_prefix": "c5746cfc", "text": "Starting moving target query.", "elapsed": 0.0, "status": "running"}
data: {"job_prefix": "c5746cfc", "text": "ATLAS Hawaii, Haleakela: Query from 2018-12-24 to 2022-01-10.", "elapsed": 0.3, "status": "running"}
data: {"job_prefix": "c5746cfc", "text": "ATLAS Hawaii, Haleakela: Searching 374.8 deg over 1115.0 days", "elapsed": 6.1, "status": "running"}
data: {"job_prefix": "c5746cfc", "text": "ATLAS Hawaii, Haleakela: Caught 479 observations", "elapsed": 10.2, "status": "running"}
data: {"job_prefix": "c5746cfc", "text": "Task complete.", "elapsed": 17.3, "status": "success"}
Each message is labeled with the first eight characters of the user's job_id
to prevent data piracy. A user would monitor the stream for their own messages
to track the status of their query via the 'status'
field, which can take the
values:
'queued'
: the query has been received and is waiting for a woRQer to execute it.'running'
: a woRQer is executing the query.'success'
: the query terminated without error.'error'
: the query terminated with an error. In all cases, the'text'
field explains the status in a human-readable format.
Internally, messages are passed from the woRQers to the main thread running the
/stream
route with a redis stream using XADD/XREAD. The redis stream preserves
a limited message history so that a user's connection can be interrupted without
the status of the query being immediately lost.
CATCH-APIs may be developed and run using docker. To develop locally:
- Install
docker
anddocker-compose-v2
on your machine. - Clone this repo.
- Copy .env-template to .env and edit.
- CATCH requires a Postgres database populated with survey image metadata. This
data can be harvested from the original source files, or simply a copy of a
prior database.
- If starting from a blank database, see Adding New Data.
- If starting from a copy of a prior database, we recommend using a file
generated by the
pg_dump
program. Please contact MKelley or D-W-D for the data. Copy that file within your clone of this repo to.pg-init-data/some-name.backup
.
- Run docker compose:
- There are two ways to start all of the relevant containers: dev mode and prod mode. In development mode, the code base for the apis will be "bind-mounted" into the container so that changes made to the code on your machine get reflected instantly in the running application (the API and woRQer processes are run via nodemon in dev mode). In prod mode, the code-base is installed from github, so changes will not be picked up dynamically at run-time.
- To run everything in development mode, run
docker compose -f docker-compose.dev.yml up --build
- To stop everything, enter CTRL+C once to stop processes gracefully;
sometimes this might fail to properly shutdown everything, in which case you
can swap
... up --build
with... down
to re-try shutting everything down- Also, when you bring docker compose systems up/down, it's sometimes
helpful to also remove the stopped containers with
docker container prune
- Also, when you bring docker compose systems up/down, it's sometimes
helpful to also remove the stopped containers with
- To run everything in prod mode, run
docker compose -f docker-compose.prod.yml up --build
- DB setup:
- The CATCH tool requires a postgresDB populated with initial data. In previous implementations, this DB was provided as a separate docker container. Now, however, we have separated the DB into its own postgres service on AWS RDS. If you want to develop this code, you therefore need to connect to the development db on our group's AWS instance (contact DWD or MSK for access), or you need to set up your own postgres instance and populate with data.
- Once everything is running in dev mode, visit http://localhost:5000/ui to see the swagger interface, and in a separate tab open to http://localhost:5000/stream.
cp .env-template .env
and edit to suit your needs.- Build a virtual environment for development:
_build_venv
deactivate
if already running a virtual environmentsource _activate
- Follow
catch
database setup instructions, or start from a previous database copy (see Docker notes). - Run the webapp:
_entrypoint_webapp
- Run the task workers (woRQers):
_entrypoint_woRQer
- Once everything is running in dev mode, visit http://localhost:5000/ui to see the swagger interface, and in a separate tab open to http://localhost:5000/stream
Whether starting from a blank database, or a working copy, you will probably want to add some new data.
-
Follow
catch
library documentation for adding a new data source. -
Add the source table name to the API's list of allowed sources:
services/apis/src/api/openapi.yaml
atpaths.catch.get.parameters[name=sources].schema.items.enum
.
-
Optionally update tests, e.g.,
tests/test_query_moving.py
.
Unit tests and tests on a live database are available. Requirements are listed in requirements.dev.txt
. In addition, a running redis server on port 6379 is required.
Unit tests are based on a temporary postgres instance and simulated data. First, install the testing requirements using the [test] option:
pip install .[test]
Then run the tests with pytest:
pytest tests
Pro tip! Use
--capture=tee-sys
to see CATCH and CATCH APIs console output produced during tests.
Live database tests are in the live-tests
directory. They require a
configured .env
file with DEPLOYMENT_TIER=LOCAL
. The queries rely on
Catalina, Spacewatch, NEAT, PanSTARRS, and SkyMapper data.
pytest live-tests