Creates bagit archives of sensor data retrieved from ERDDAP.
First, set environment variables as necessary for thebagit metadata. Variables are as follows:
BAGIT_BAG_GROUP_IDENTIFIER
BAGIT_CONTACT_EMAIL
BAGIT_CONTACT_NAME
BAGIT_CONTACT_PHONE
BAGIT_ORGANIZATION_ADDRESS
BAGIT_SOURCE_ORGANIZATION
Any that are not set will default to emptry string in the metadata.
Bagitify supports multiple instances of any of these, accomplished by suffixing the variable names with a string of your choice. A number or something descriptive of the reason for the duplication probably makes the most sense, but it is basically arbitrary.
For example, setting both BAGIT_CONTACT_PHONE
and BAGIT_CONTACT_PHONE_2
would result in
two phone numbers being saved in the bagit metadata.
To actually call the program, run
./bagitify/bagitify.py [-d DIRECTORY] [-s START] [-e END] [-v] [-f] <tabledap_url>`
-d
allows the user to specify the directory to create the bagit archive in. If not set,
the default is a directory in ./bagit_archives
with an autogenerated name like
edu_usf_marine_comps_2022-05_2022-09_bagit
based on the tabledap url, start, and end dates.
-s
and -e
specify start and end dates. Expected datetime format is %Y-%m-%d
or %Y-%m-%dT%H:%M:%SZ
.
For example, 2022-05-01
and 2022-05-01T00:00:00Z
are valid. If a start or end is not set, it defaults to
the start or end of the data in ERDDAP, repsectively. In any case, it will be internally
rounded to the first day of the month for the start, and the first day of the next month for the end.
-v
activates verbose mode, where additional information on operations is printed.
-f
activates force mode, where existing files are deleted and redownloaded.
Finally, tabledap_url
is an ERDDAP tabledap url such as https://erddap.secoora.org/erddap/tabledap/edu_usf_marine_comps_1407d550.html
Putting it all together, bagitify might be run like so:
python bagitify/bagitify.py -s 2022-05-01 -e 2022-08-01 https://erddap.secoora.org/erddap/tabledap/edu_usf_marine_comps_1407d550.html
By default, monthly netCDF files which have already been downloaded are skipped/not redownloaded, unless the end of the month is after the modification date of the local file. This behavior ensures that the netCDF file for the current month gets updated as new data is collected.
To force existing files to be deleted and redownloaded, set the -f
or --force
argument.
The Docker version works essentially the same way, though the variables will need to be set through the docker command, and it will be important to bind mount the place it will be writing to so that you can get the results. For example:
docker build -t bagitify .
docker run \
-e BAGIT_BAG_GROUP_IDENTIFIER="bgi" \
-e BAGIT_CONTACT_EMAIL="[email protected]" \
-e BAGIT_CONTACT_NAME="John Doe" \
-e BAGIT_CONTACT_PHONE="(123) 456-7890" \
-e BAGIT_ORGANIZATION_ADDRESS="123 Fake Street, Some Town, AK 12345" \
-e BAGIT_SOURCE_ORGANIZATION="Fake Org" \
-v ./ncei-archives:/srv/bagitify/bagit_archives \
bagitify -s 2022-05-01 -e 2022-08-01 -v https://erddap.secoora.org/erddap/tabledap/edu_usf_marine_comps_1407d550.html
An example Docker Compose file is also provided in this repository at docker-compose.yml
.
The mounted data directory must be writable by the user in the bagitify container,
which is user is 57439 by default but can be set via the BAGITIFY_USER
environment variable.
Example to run using Docker Compose using your user id:
mkdir -p ./ncei-archives
BAGITIFY_USER=$(id -u) docker compose run bagitify
Environment variables can also be managed using --env-file
in docker run
.
Contributions via pull request are welcome. Please add tests for any new features and fix any
formatting issues identified by flake8
prior to submitting.