Skip to content

Commit

Permalink
Postgres ranges rebuild (#1017)
Browse files Browse the repository at this point in the history
* Implement new schema and management - nothing is using it yet!

* Start getting update process onto new schema - incomplete work in progress.

Includes some product->layer name refactoring.

* Writing to the new layer-based schema, with batch caching.

* Reading from the new layer range table.  More product->layer renaming.

* Passing mypy, failing tests.

* Passing unit tests, server intialising. Integration tests still failing.

* Passing integration tests.

* make datacube/env handling more generic (one step closer to multi-db) and passing mypy.

* Passing all tests.

* Add new tests and fix broken tests.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lintage.

* lintage.

* Don't rely on DEA Explorer

* Update db to postgres 16 and use DB_URL

* Revert main docker-compose.yaml

* Need port as well.

* Fix nodb test fixture for GH

* Opps - used non-raw github link.

* Fix ows-update call in GHA test prep script.

* Update documentation.

* Fix spelling or add (non-)words to wordlist.

* Various fixes/cleanups found on self-review.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Make no_db in test_no_db_routes a proper fixture.

* Documentation edits

* Some cleanup in wms_utils.py

* Some cleanup in update_ranges_impl.py

* Make access in initialiser more consistent.

* Provide better examples of role granting in scripts and documentation.

* Fix inconsistent indentation.

* Typo

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
SpacemanPaul and pre-commit-ci[bot] authored May 10, 2024
1 parent c8214a7 commit d1ac065
Show file tree
Hide file tree
Showing 87 changed files with 963 additions and 1,017 deletions.
3 changes: 2 additions & 1 deletion .env_simple
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
################
# ODC DB Config
# ##############
DB_HOSTNAME=postgres
ODC_DEFAULT_DB_URL=postgresql://opendatacubeusername:opendatacubepassword@postgres:5432/opendatacube
# Needed for docker db image.
DB_PORT=5432
DB_USERNAME=opendatacubeusername
DB_PASSWORD=opendatacubepassword
Expand Down
23 changes: 4 additions & 19 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Before you submit a pull request, check that it meets these guidelines:

1. The pull request should include tests (and should pass them - and all pre-existing tests!)
2. If the pull request adds or modifies functionality, the docs should be updated.
3. The pull request should work for Python 3.7+. Check the results of
3. The pull request should work for Python 3.10+. Check the results of
the github actions and make sure that your PR passes all checks and
does not decrease test coverage.

Expand Down Expand Up @@ -143,8 +143,9 @@ indexing and create db dump
# now go to ows container
docker exec -it datacube-ows_ows_1 bash
datacube-ows-update --schema --role <db_read_role>
datacube-ows-update --views
# Run this a database superuser role
datacube-ows-update --schema --read-role <db_read_role> --write-role <db_write_role>
# Run this as the <db_write_role> user above
datacube-ows-update
exit
Expand Down Expand Up @@ -178,22 +179,6 @@ manually modify translation for `de` for `assert` test to pass, then create `ows
docker cp datacube-ows_ows_1:/tmp/translations datacube-ows/integrations/cfg/
Generating database relationship diagram
----------------------------------------

.. code-block:: console
docker run -it --rm -v "$PWD:/output" --network="host" schemaspy/schemaspy:snapshot -u $DB_USERNAME -host localhost -port $DB_PORT -db $DB_DATABASE -t pgsql11 -schemas wms -norows -noviews -pfp -imageformat svg
Merge relationship diagram and orphan diagram

.. code-block:: console
python3 svg_stack.py --direction=h --margin=100 ../wms/diagrams/summary/relationships.real.large.svg ../wms/diagrams/orphans/orphans.svg > ows.merged.large.svg
cp svg_stack/ows.merged.large.svg ../datacube-ows/docs/diagrams/db-relationship-diagram.svg
Links
-----

Expand Down
22 changes: 10 additions & 12 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,20 @@ datacube-ows
============

.. image:: https://github.com/opendatacube/datacube-ows/workflows/Linting/badge.svg
:target: https://github.com/opendatacube/datacube-ows/actions?query=workflow%3ALinting
:target: https://github.com/opendatacube/datacube-ows/actions?query=workflow%3ACode%20Linting

.. image:: https://github.com/opendatacube/datacube-ows/workflows/Tests/badge.svg
:target: https://github.com/opendatacube/datacube-ows/actions?query=workflow%3ATests

.. image:: https://github.com/opendatacube/datacube-ows/workflows/Docker/badge.svg
:target: https://github.com/opendatacube/datacube-ows/actions?query=workflow%3ADocker
:target: https://github.com/opendatacube/datacube-ows/actions?query=workflow%3ADockerfile%20Linting

.. image:: https://github.com/opendatacube/datacube-ows/workflows/Scan/badge.svg
:target: https://github.com/opendatacube/datacube-ows/actions?query=workflow%3A%22Scan%22

.. image:: https://codecov.io/gh/opendatacube/datacube-ows/branch/master/graph/badge.svg
:target: https://codecov.io/gh/opendatacube/datacube-ows

.. image:: https://img.shields.io/pypi/v/datacube?label=datacube
:alt: PyPI

Datacube Open Web Services
--------------------------

Expand All @@ -41,7 +38,7 @@ Features
System Architecture
-------------------

.. image:: docs/diagrams/ows_diagram.png
.. image:: docs/diagrams/ows_diagram1.9.png
:width: 700

Community
Expand Down Expand Up @@ -141,14 +138,14 @@ To run the standard Docker image, create a docker volume containing your ows con
-e AWS_DEFAULT_REGION=ap-southeast-2 \ # AWS Default Region (supply even if NOT accessing files on S3! See Issue #151)
-e SENTRY_DSN=https://[email protected]/projid \ # Key for Sentry logging (optional)
\ # Database connection URL: postgresql://<username>:<password>@<hostname>:<port>/<database>
-e ODC_DEFAULT_DB_URL=postgresql://cube:DataCube@172.17.0.1:5432/datacube \
-e ODC_DEFAULT_DB_URL=postgresql://myuser:mypassword@172.17.0.1:5432/mydb \
-e PYTHONPATH=/code # The default PATH is under env, change this to target /code
-p 8080:8000 \ # Publish the gunicorn port (8000) on the Docker
\ # container at port 8008 on the host machine.
--mount source=test_cfg,target=/code/datacube_ows/config \ # Mount the docker volume where the config lives
name_of_built_container

The image is based on the standard ODC container.
The image is based on the standard ODC container and an external database

Installation with Conda
------------
Expand All @@ -157,7 +154,7 @@ The following instructions are for installing on a clean Linux system.

* Create a conda python 3.8 and activate conda environment::

conda create -n ows -c conda-forge python=3.8 datacube pre_commit postgis
conda create -n ows -c conda-forge python=3.10 datacube pre_commit postgis
conda activate ows

* install the latest release using pip install::
Expand Down Expand Up @@ -186,7 +183,7 @@ The following instructions are for installing on a clean Linux system.
# to create schema, tables and materialised views used by datacube-ows.

export DATACUBE_OWS_CFG=datacube_ows.ows_cfg_example.ows_cfg
datacube-ows-update --role ubuntu --schema
datacube-ows-update --write-role ubuntu --schema


* Create a configuration file for your service, and all data products you wish to publish in
Expand Down Expand Up @@ -253,8 +250,9 @@ Local Postgres database
| xargs -n1 -I {} datacube dataset add s3://deafrica-data/{}

5. Write an ows config file to identify the products you want available in ows, see example here: https://github.com/opendatacube/datacube-ows/blob/master/datacube_ows/ows_cfg_example.py
6. Run `datacube-ows-update --schema --role <db_read_role>` to create ows specific tables
7. Run `datacube-ows-update` to generate ows extents.
6. Run ``datacube-ows-update --schema --read-role <db_read_role> --write-role <db_write_role>`` as a database
superuser role to create ows specific tables and views
7. Run ``datacube-ows-update`` as ``db_write_role`` to populate ows extent tables.

Apache2 mod_wsgi
----------------
Expand Down
4 changes: 2 additions & 2 deletions check-code-all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ datacube product add https://raw.githubusercontent.com/GeoscienceAustralia/dea-c

# Geomedian for summary product testing

datacube product add https://explorer-aws.dea.ga.gov.au/products/ga_ls8c_nbart_gm_cyear_3.odc-product.yaml
datacube product add https://raw.githubusercontent.com/GeoscienceAustralia/dea-config/master/products/baseline_satellite_data/geomedian-au/ga_ls8c_nbart_gm_cyear_3.odc-product.yaml

# S2 multiproduct datasets
datacube dataset add https://dea-public-data.s3.ap-southeast-2.amazonaws.com/baseline/ga_s2bm_ard_3/52/LGM/2017/07/19/20170719T030622/ga_s2bm_ard_3-2-1_52LGM_2017-07-19_final.odc-metadata.yaml --ignore-lineage
Expand All @@ -44,7 +44,7 @@ datacube dataset add https://dea-public-data.s3.ap-southeast-2.amazonaws.com/der
datacube dataset add https://dea-public-data.s3.ap-southeast-2.amazonaws.com/derivative/ga_ls8c_nbart_gm_cyear_3/3-0-0/x17/y37/2021--P1Y/ga_ls8c_nbart_gm_cyear_3_x17y37_2021--P1Y_final.odc-metadata.yaml --ignore-lineage

# create material view for ranges extents
datacube-ows-update --schema --role $DB_USERNAME
datacube-ows-update --schema --write-role $DB_USERNAME
datacube-ows-update

# run test
Expand Down
4 changes: 2 additions & 2 deletions datacube_ows/cfg_parser_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,12 +117,12 @@ def parse_path(path: str | None, parse_only: bool, folders: bool, styles: bool,
click.echo()
click.echo("Layers and Styles")
click.echo("=================")
for lyr in cfg.product_index.values():
for lyr in cfg.layer_index.values():
click.echo(f"{lyr.name} [{','.join(lyr.product_names)}]")
print_styles(lyr)
click.echo()
if input_file or output_file:
layers_report(cfg.product_index, input_file, output_file)
layers_report(cfg.layer_index, input_file, output_file)
return True


Expand Down
36 changes: 18 additions & 18 deletions datacube_ows/loading.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,37 +125,37 @@ def simple_layer_query(cls, layer: OWSNamedLayer,
class DataStacker:
@log_call
def __init__(self,
product: OWSNamedLayer,
layer: OWSNamedLayer,
geobox: GeoBox,
times: list[datetime.datetime],
resampling: Resampling | None = None,
style: StyleDef | None = None,
bands: list[str] | None = None):
self._product = product
self.cfg = product.global_cfg
self._layer = layer
self.cfg = layer.global_cfg
self._geobox = geobox
self._resampling = resampling if resampling is not None else "nearest"
self.style = style
if style:
self._needed_bands = list(style.needed_bands)
elif bands:
self._needed_bands = [self._product.band_idx.locale_band(b) for b in bands]
self._needed_bands = [self._layer.band_idx.locale_band(b) for b in bands]
else:
self._needed_bands = list(self._product.band_idx.measurements.keys())
self._needed_bands = list(self._layer.band_idx.measurements.keys())

for band in self._product.always_fetch_bands:
for band in self._layer.always_fetch_bands:
if band not in self._needed_bands:
self._needed_bands.append(band)
self.raw_times = times
if product.mosaic_date_func:
self._times = [product.mosaic_date_func(product.ranges["times"])]
if self._layer.mosaic_date_func:
self._times = [self._layer.mosaic_date_func(layer.ranges.times)]
else:
self._times = [
self._product.search_times(
self._layer.search_times(
t, self._geobox)
for t in times
]
self.group_by = self._product.dataset_groupby()
self.group_by = self._layer.dataset_groupby()
self.resource_limited = False

def needed_bands(self) -> list[str]:
Expand Down Expand Up @@ -185,7 +185,7 @@ def datasets(self, index: datacube.index.Index,
# Not returning datasets - use main product only
queries = [
ProductBandQuery.simple_layer_query(
self._product,
self._layer,
self.needed_bands(),
self.resource_limited)

Expand All @@ -194,10 +194,10 @@ def datasets(self, index: datacube.index.Index,
# we have a style - lets go with that.
queries = ProductBandQuery.style_queries(self.style)
elif all_flag_bands:
queries = ProductBandQuery.full_layer_queries(self._product, self.needed_bands())
queries = ProductBandQuery.full_layer_queries(self._layer, self.needed_bands())
else:
# Just take needed bands.
queries = [ProductBandQuery.simple_layer_query(self._product, self.needed_bands())]
queries = [ProductBandQuery.simple_layer_query(self._layer, self.needed_bands())]

if point:
geom = point
Expand Down Expand Up @@ -338,14 +338,14 @@ def manual_data_stack(self,
d = self.read_data_for_single_dataset(ds, measurements, self._geobox, fuse_func=fuse_func)
extent_mask = None
for band in non_flag_bands:
for f in self._product.extent_mask_func:
for f in self._layer.extent_mask_func:
if extent_mask is None:
extent_mask = f(d, band)
else:
extent_mask &= f(d, band)
if extent_mask is not None:
d = d.where(extent_mask)
if self._product.solar_correction and not skip_corrections:
if self._layer.solar_correction and not skip_corrections:
for band in non_flag_bands:
d[band] = solar_correct_data(d[band], ds)
if merged is None:
Expand Down Expand Up @@ -383,7 +383,7 @@ def read_data(self,
measurements=measurements,
fuse_func=fuse_func,
skip_broken_datasets=skip_broken,
patch_url=self._product.patch_url,
patch_url=self._layer.patch_url,
resampling=resampling)
except Exception as e:
_LOG.error("Error (%s) in load_data: %s", e.__class__.__name__, str(e))
Expand All @@ -399,7 +399,7 @@ def read_data_for_single_dataset(self,
resampling: Resampling = "nearest",
fuse_func: datacube.api.core.FuserFunction | None = None) -> xarray.Dataset:
datasets = [dataset]
dc_datasets = datacube.Datacube.group_datasets(datasets, self._product.time_resolution.dataset_groupby())
dc_datasets = datacube.Datacube.group_datasets(datasets, self._layer.time_resolution.dataset_groupby())
CredentialManager.check_cred()
try:
return datacube.Datacube.load_data(
Expand All @@ -408,7 +408,7 @@ def read_data_for_single_dataset(self,
measurements=measurements,
fuse_func=fuse_func,
skip_broken_datasets=skip_broken,
patch_url=self._product.patch_url,
patch_url=self._layer.patch_url,
resampling=resampling)
except Exception as e:
_LOG.error("Error (%s) in load_data: %s", e.__class__.__name__, str(e))
Expand Down
10 changes: 5 additions & 5 deletions datacube_ows/mv_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ def get_sqlalc_engine(index: Index) -> Engine:

def get_st_view(meta: MetaData) -> Table:
return Table('space_time_view', meta,
Column('id', UUID()),
Column('dataset_type_ref', SMALLINT()),
Column('spatial_extent', Geometry(from_text='ST_GeomFromGeoJSON', name='geometry')),
Column('temporal_extent', TSTZRANGE())
)
Column('id', UUID()),
Column('dataset_type_ref', SMALLINT()),
Column('spatial_extent', Geometry(from_text='ST_GeomFromGeoJSON', name='geometry')),
Column('temporal_extent', TSTZRANGE()),
schema="ows")


_meta = MetaData()
Expand Down
21 changes: 12 additions & 9 deletions datacube_ows/ogc.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,14 +179,17 @@ def ogc_wcs_impl():
def ping():
db_ok = False
cfg = get_config()
with cfg.dc.index._db.give_me_a_connection() as conn:
results = conn.execute(text("""
SELECT *
FROM wms.product_ranges
LIMIT 1""")
)
for r in results:
db_ok = True
try:
with cfg.dc.index._db.give_me_a_connection() as conn:
results = conn.execute(text("""
SELECT *
FROM ows.layer_ranges
LIMIT 1""")
)
for r in results:
db_ok = True
except Exception:
pass
if db_ok:
return (render_template("ping.html", status="Up"), 200, resp_headers({"Content-Type": "text/html"}))
else:
Expand All @@ -202,7 +205,7 @@ def ping():
def legend(layer, style, dates=None):
# pylint: disable=redefined-outer-name
cfg = get_config()
product = cfg.product_index.get(layer)
product = cfg.layer_index.get(layer)
if not product:
return ("Unknown Layer", 404, resp_headers({"Content-Type": "text/plain"}))
if dates is None:
Expand Down
Loading

0 comments on commit d1ac065

Please sign in to comment.