Skip to content

Commit

Permalink
More docs improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
omad committed Aug 30, 2023
1 parent 78db460 commit 23a00df
Showing 1 changed file with 127 additions and 6 deletions.
133 changes: 127 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
![Push](https://github.com/opendatacube/datacube-alchemist/workflows/Push/badge.svg)
[![codecov](https://codecov.io/gh/opendatacube/datacube-alchemist/branch/main/graph/badge.svg?token=8dsJGc99qY)](https://codecov.io/gh/opendatacube/datacube-alchemist)

## PURPOSE
## Purpose

Datacube Alchemist is a command line application for performing Dataset to Dataset transformations in the context
of an Open Data Cube system.
Expand All @@ -25,15 +25,15 @@ Features
* Configurable thumbnail generation
* Pass any command line options as Environment Variables

## INSTALLATION
## Installation

You can build the docker image locally with Docker or Docker Compose. The commands are
`docker build --tag opendatacube/datacube-alchemist .` or `docker-compose build`.

There's a Python setup file, so you can do `pip3 install .` in the root folder. You will
need to ensure that the Open Data Cube and all its dependencies happily install though.

## USAGE
## Usage

### Development environment

Expand All @@ -49,13 +49,27 @@ To start the workspace and run an example, you can do the following:
* `make wofs-one` or `make fc-one` will process a single Fractional Cover or Water
Observations from Space scene and output the results to the ./examples folder in this project directory

## Production Usage

Datacube Alchemist is used in production by the [Digital Earth Australia](https://dea.ga.gov.au/) and [Digital Earth
Africa](https://www.digitalearthafrica.org/) programs

### Queues

Notes on queues. To run jobs from an SQS queue, good practice is to create a deadletter queue
as well as a main queue. Jobs (messages) get picked up off the main queue, and if they're successful,
then they're deleted. If they aren't successful, they're not deleted, and they go back on the
main queue after a defined amount of time. If this happens more than the defined number of times
then the message is moved to the deadletter queue. In this way, you can track work completion.

## Commands

Note that the `--config-file` can be a local path or a URI.

### datacube-alchemist run-one

<!-- [[[cog
# Regenerate this embedded CLI documentation by running `python -m cogapp -r README.md`
import cog
from datacube_alchemist import cli
from click.testing import CliRunner
Expand Down Expand Up @@ -196,6 +210,9 @@ datacube-alchemist run-from-queue \

### datacube-alchemist add-to-queue

Search for Datasets and enqueue Tasks into an AWS SQS Queue for later processing.


The `--limit` is the total number of datasets to limit to, whereas the `--product-limit` is
the number of datasets per product, in the case that you have multiple input products.

Expand Down Expand Up @@ -256,9 +273,13 @@ datacube-alchemist add-to-queue \

### datacube-alchemist redrive-to-queue

This will get items from a deadletter queue and push them to an
alive queue. Be careful, because it doesn't know what queue is what.
You need to know that!
Redrives messages from an SQS queue.

All the messages in the specified queue are re-transmitted to either their original queue
or the specified TO-QUEUE.

Be careful when manually specifying TO-QUEUE, as it's easy to mistakenly push tasks to the
wrong queue, eg. One that will process them with an incorrect configuration file.

<!-- [[[cog
print_help("redrive-to-queue")
Expand Down Expand Up @@ -288,6 +309,106 @@ datacube-alchemist redrive-to-queue \
--to-queue example-to-queue
```

### datacube-alchemist add-missing-to-queue

Search for datasets that don't have a target product dataset and add them to the queue

If a predicate is supplied, datasets which do not match are filtered out.

The predicate is a Python expression that should return True or False, which has a single
dataset available as the variable `d`.

Example predicates:
- `d.metadata.gqa_iterative_mean_xy <= 1`
- `d.metadata.gqa_iterative_mean_xy and ('2022-06-30' <= str(d.center_time.date()) <= '2023-07-01')`
- `d.metadata.dataset_maturity == "final"`

<!-- [[[cog
print_help("add-missing-to-queue")
]]] -->
```
Usage: datacube-alchemist add-missing-to-queue [OPTIONS]
Search for datasets that don't have a target product dataset and add them to
the queue
If a predicate is supplied, datasets which do not match are filtered out.
Example predicate: - 'd.metadata.gqa_iterative_mean_xy <= 1'
Options:
--predicate TEXT Python predicate to filter datasets. Dataset is
available as "d"
-c, --config-file TEXT The path (URI or file) to a config file to use for the
job [required]
-q, --queue TEXT Name of an AWS SQS Message Queue [required]
--dryrun, --no-dryrun Don't actually do real work
--help Show this message and exit.
```
<!-- [[[end]]] -->

## Configuration File

A YAML file with 3 sections:

- [`specification`](#specification-of-inputs) - Define the inputs and algorithm
- [`output`](#specification-of-inputs) - Output location and format options
- [`processing`](#) - Optimise CPU/Memory requirements

Datacube Alchemist requires a configuration file in YAML format, to setup the Algorithm or Transformation,
the input Dataset/s, as well as details of the outputs including metadata, destination and preview image generation.

The configuration file has 3 sections. `specification` sets up the input ODC product, data bands and
configured algorithm to run . `output` sets where the output files will be written, how the preview image will be
created, and what extra metadata to include. `processing` can help configure the tasks memory and CPU requirements..

### Specification (of inputs)

Defines the input data and the algorithm to process it.

**product** or **products**
Names of the

**measurements:** [list] of measurement names to load from the input products

**measurement_renames:** [map] rename measurements from the input data before passing to the transform

**transform:** [string] fully qualified name of a Python class implementing the transform

**transform_url:** [string] Reference URL for the Transform, to record in the output metadata

**override_product_family:** Override part of the metadata (should be in <output>)

**basis:** ????

**transform_args:** [map] Named arguments to pass to the Transformer class


### Full example specification

```yaml
specification:
products:
- ga_ls5t_ard_3
- ga_ls7e_ard_3
- ga_ls8c_ard_3
measurements: ['nbart_blue', 'nbart_green', 'nbart_red', 'nbart_nir', 'nbart_swir_1', 'nbart_swir_2', 'oa_fmask']
measurement_renames:
oa_fmask: fmask

aws_unsigned: False
transform: wofs.virtualproduct.WOfSClassifier
transform_url: 'https://github.com/GeoscienceAustralia/wofs/'
override_product_family: ard
basis: nbart_green

transform_args:
dsm_path: 's3://dea-non-public-data/dsm/dsm1sv1_0_Clean.tiff'
```
### Transform Class Implementation
## License
Apache License 2.0
Expand Down

0 comments on commit 23a00df

Please sign in to comment.