Skip to content

Commit

Permalink
Add initial redis cache support for reference datasets (#33)
Browse files Browse the repository at this point in the history
* First pass at redis cache in reference reads

* Turn off cache

* Fix requirements, get cache working

* Add lots more documentation, do settings correctly

* Update kubernetes and circleci build arguments

* Remove rps specific kubernetes yaml

* Add redis cache to vdatum extension

* vdatum safety enhancements

* Make timeout configurable

* Cleanup reading settings config

* Add schema to readme

* Add back the memory dataset cache

* Fix cache timeout

* Bump redis fsspec cache req
  • Loading branch information
mpiannucci authored May 27, 2024
1 parent 779cc65 commit 5e06f9f
Show file tree
Hide file tree
Showing 19 changed files with 341 additions and 143 deletions.
4 changes: 2 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
echo "export TAG=${TAG}" >> $BASH_ENV
echo "Building for TAG ${TAG}"
docker build -t ${ECR_REPO}:${TAG} .
docker build --build-arg="ROOT_PATH=/xreds/" -t ${ECR_REPO}:${TAG} .
- run:
name: Install Grype
Expand Down Expand Up @@ -76,4 +76,4 @@ workflows:
filters:
branches:
only:
- main
- main
9 changes: 6 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ COPY viewer/index.html ./index.html
COPY viewer/public ./public
COPY viewer/src ./src

ARG ROOT_PATH=/xreds/
ARG ROOT_PATH
ENV VITE_XREDS_BASE_URL=${ROOT_PATH}
RUN npm run build

Expand Down Expand Up @@ -64,8 +64,11 @@ COPY --from=0 /opt/viewer/dist ./viewer/dist

# Set the port to run the server on
ENV PORT 8090
ARG ROOT_PATH=/xreds/
ARG ROOT_PATH
ENV ROOT_PATH ${ROOT_PATH}

ARG WORKERS=1
ENV WORKERS ${WORKERS}

# Run the webserver
CMD ["sh", "-c", "gunicorn --workers=1 --worker-class=uvicorn.workers.UvicornWorker --log-level=debug --bind=0.0.0.0:${PORT} app:app"]
CMD ["sh", "-c", "gunicorn --workers=${WORKERS} --worker-class=uvicorn.workers.UvicornWorker --log-level=debug --bind=0.0.0.0:${PORT} app:app"]
106 changes: 96 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,17 +38,17 @@ Build the react app

```bash
cd viewer/
yarn install
yarn build
npm run install
npm run build
```

Run the following in the activated `virtualenv`:

```bash
datasets_mapping_file=./test.json python app.py
DATASETS_MAPPING_FILE=./test.json python app.py
```

Where `datasets_mapping_file` is the path to the dataset key value store specified in the previous section. You can now navigate to http://localhost:8090/docs to see the supported operations
Where `DATASETS_MAPPING_FILE` is the path to the dataset key value store as described [here](./README.md#specifying-datasets). You can now navigate to `http://localhost:8090/docs` to see the supported operations

## Running With Docker

Expand All @@ -60,16 +60,37 @@ The docker container for the app can be built with:
docker build -t xreds:latest .
```

Once built, it requires a few things to be run: The 8090 port to be exposed, and a volume for the datasets to live in, and the environment variable pointing to the dateset json file.
There are aso build arguments available when building the docker image:

- `ROOT_PATH`: The root path the app will be served from. Defaults to `/xreds/`.
- `WORKERS`: The number of gunicorn workers to run. Defaults to `1`.

Once built, it requires a few things to be run: The `8090` port to be exposed, and a volume for the datasets to live in, and the environment variable pointing to the dateset json file.

```bash
docker run -p 8090:8090 -e "datasets_mapping_file=/path/to/datasets.json" -v "/path/to/datasets:/opt/xreds/datasets" xreds:latest
docker run -p 8090:8090 -e "DATASETS_MAPPING_FILE=/path/to/datasets.json" -v "/path/to/datasets:/opt/xreds/datasets" xreds:latest
```

### Running with `docker compose`

There are a few `docker compose` examples to get started with:

#### Vanilla

```bash
docker compose --platform=linux/amd64 up -d
docker compose -d
```

#### With Redis

```bash
docker compose -f docker-compose-redis.yml up -d
```

#### With NGINX Proxy

```bash
docker compose -f docker-compose-nginx.yml up -d
```

## Specifying Datasets
Expand All @@ -82,7 +103,13 @@ Datasets are specified in a key value manner, where the keys are the dataset ids
"path": "s3://nextgen-dmac/kerchunk/gfswave_global_kerchunk.json",
"type": "kerchunk",
"chunks": {},
"drop_variables": ["orderedSequenceData"]
"drop_variables": ["orderedSequenceData"],
"target_protocol": "s3",
"target_options": {
"anon": false,
"key": "my aws key"
"secret": "my aws secret"
}
},
"dbofs": {
"path": "s3://nextgen-dmac/nos/nos.dbofs.fields.best.nc.zarr",
Expand All @@ -91,6 +118,7 @@ Datasets are specified in a key value manner, where the keys are the dataset ids
"ocean_time": 1
},
"drop_variables": ["dstart"]

}
}
```
Expand All @@ -108,9 +136,67 @@ gfswave_global:
- orderedSequenceData
```
Currently `zarr`, `netcdf`, and [`kerchunk`](https://github.com/fsspec/kerchunk) dataset types are supported. This information should be saved a file and specified when running.
Currently `zarr`, `netcdf`, and [`kerchunk`](https://github.com/fsspec/kerchunk) dataset types are supported. This information should be saved in a file and specified when running via environment variable `DATASETS_MAPPING_FILE`.

### Dataset Type Schema

#### kerchunk

```json
{
"path": "s3://nextgen-dmac/kerchunk/gfswave_global_kerchunk.json",
"type": "kerchunk",
"chunks": {},
"drop_variables": ["orderedSequenceData"],
"remote_protocol": "s3", // default is s3
"remote_options": {
"anon": true, // default is True
},
"target_protocol": "s3", // defualt is s3
"target_options": {
"anon": false, // default is True
},
"extensions": { // optional
"vdatum": {
"path": "s3://nextgen-dmac-cloud-ingest/nos/vdatums/ngofs2_vdatums.nc.zarr", // fsspec path to vdatum dataset
"water_level_var": "zeta", // variable to use for water level
"vdatum_var": "mllwtomsl", // variable mapping to vdatum transformation
"vdatum_name": "mllw" // name of the vdatum transformation
}
}
}
```

#### netcdf

```json
{
"path": "http://www.smast.umassd.edu:8080/thredds/dodsC/models/fvcom/NECOFS/Forecasts/NECOFS_GOM7_FORECAST.nc",
"type": "netcdf",
"engine": "netCDF4", // default is netCDF4
"chunks": {},
"drop_variables": ["Itime", "Itime2"],
"additional_coords": ["lat", "lon", "latc", "lonc", "xc", "yc"],
"extensions": { // optional
... // Same as kerchunk options
}
}
```

## Configuration Options

The following environment variables can be set to configure the app:

- `DATASETS_MAPPING_FILE`: The fsspec compatible path to the dataset key value store as described [here](./README.md#specifying-datasets)
- `PORT`: The port the app should run on. Defaults to `8090`.
- `ROOT_PATH`: The root path the app will be served from. Defaults to be served from the root.
- `DATASET_CACHE_TIMEOUT`: The time in seconds to cache the dataset metadata. Defaults to `600` (10 minutes).
- `EXPORT_THRESHOLD`: The maximum size file to allow to be exported. Defaults to `500 MB`
- `USE_REDIS_CACHE`: Whether to use a redis cache for the app. Defaults to `False`
- `REDIS_HOST`: [Optional] The host of the redis cache. Defaults to `localhost`
- `REDIS_PORT`: [Optional] The port of the redis cache. Defaults to `6379`

## Building and Deploying Docker Image
## Building and Deploying Public Docker Image

First follow instructions above to build the docker image tagged `xreds:latest`. Then the`xreds:latest` image needs to be tagged and deployed to the relevant docker registry.

Expand Down
9 changes: 5 additions & 4 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@
import xpublish

from fastapi.middleware.cors import CORSMiddleware

from xreds.config import settings
from xreds.plugins.export import ExportPlugin
from xreds.plugins.size_plugin import SizePlugin

from xreds.spastaticfiles import SPAStaticFiles
from xreds.dataset_provider import DatasetProvider
from xreds.plugins.subset_plugin import SubsetPlugin, SubsetSupportPlugin
Expand All @@ -20,13 +21,13 @@
datasets=None,
)

export_threshold = int(os.environ.get("EXPORT_THRESHOLD", 500))
export_threshold = settings.export_threshold

rest.register_plugin(DatasetProvider())
rest.register_plugin(SubsetSupportPlugin())
rest.register_plugin(SubsetPlugin())
rest.register_plugin(SizePlugin())
rest.register_plugin(ExportPlugin(export_threshold=export_threshold))
rest.register_plugin(ExportPlugin())

app = rest.app

Expand All @@ -39,7 +40,7 @@
)

app.mount("/", SPAStaticFiles(directory="./viewer/dist", html=True), name="viewer")
app.root_path = os.environ.get("ROOT_PATH")
app.root_path = settings.root_path


if __name__ == "__main__":
Expand Down
44 changes: 0 additions & 44 deletions deploy.yaml

This file was deleted.

2 changes: 1 addition & 1 deletion docker-compose.nginx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@ services:
environment:
- PORT=8091
- ROOT_PATH=:8090
- datasets_mapping_file=/opt/xreds/datasets/datasets.json
- DATASETS_MAPPING_FILE=/opt/xreds/datasets/datasets.json
29 changes: 29 additions & 0 deletions docker-compose.redis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
version: '3'

services:
redis:
container_name: redis
image: redis:7-alpine
volumes:
- ./redis/redis.conf:/usr/local/etc/redis/redis.conf
restart: on-failure
ports:
- "6380:6380"
command: redis-server /usr/local/etc/redis/redis.conf
xreds:
container_name: xreds
build: .
volumes:
- "./datasets:/opt/xreds/datasets"
platform: linux/amd64
ports:
- "8090:8090"
depends_on:
- redis
environment:
- PORT=8090
- DATASETS_MAPPING_FILE=/opt/xreds/datasets/datasets.json
- EXPORT_THRESHOLD=600
- USE_REDIS_CACHE=true
- REDIS_HOST=redis
- REDIS_PORT=6380
8 changes: 5 additions & 3 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@ version: '3'

services:
xreds:
image: xreds:latest
platform: linux/amd64
container_name: xreds
build:
context: .
volumes:
- "./datasets:/opt/xreds/datasets"
platform: linux/amd64
ports:
- "8090:8090"
environment:
- PORT=8090
- datasets_mapping_file=/opt/xreds/datasets/datasets.json
- DATASETS_MAPPING_FILE=/opt/xreds/datasets/datasets.json
- EXPORT_THRESHOLD=600
2 changes: 1 addition & 1 deletion nginx/nginx.conf
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ http {
inactive=24h max_size=2g;
server {
location / {
proxy_pass http://zms:8091;
proxy_pass http://xms:8091;
proxy_set_header Host $host;
proxy_buffering on;
proxy_cache STATIC;
Expand Down
5 changes: 5 additions & 0 deletions redis/redis.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
port 6380
protected-mode no

# Save to disk every 60 seconds if at least 1 key has changed
save 60 1
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,5 @@ xpublish-wms@git+https://github.com/xpublish-community/xpublish-wms@9574a71405e4
xpublish-edr@git+https://github.com/xpublish-community/xpublish-edr@019e53acd2e0ad5a1d909d1acfe9863f2e90e51b
opendap-protocol<1.2.0
xarray-subset-grid@git+https://github.com/asascience-open/xarray-subset-grid@81ce464b6357e7353deaaf350ad1be22295d238e
redis-fsspec-cache@git+https://github.com/mpiannucci/redis-fsspec-cache.git@c5f241f113964ec844dbdd48f9eae6119290d1fa
redis==5.0.4
30 changes: 27 additions & 3 deletions xreds/config.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,32 @@
from pydantic_settings import BaseSettings


class Settings(BaseSettings):
datasets_mapping_file: str
class Settings(BaseSettings):
'''Settings for running xreds'''
# fsspec compatible url path to the dataset mapping file
# in either json or yml format
datasets_mapping_file: str = ''

# Root path for the service to mount at
root_path: str = ''

settings = Settings()
# Timeout for caching datasets in seconds
dataset_cache_timeout: int = 10 * 60

# Size threshold exporting datasets to local files
# in MB
export_threshold: int = 500

# Whether to use redis to cache datasets when possible
use_redis_cache: bool = False

# Optional redis host name
# If not provided, will default to localhost
redis_host: str = "localhost"

# Optional redis port number
# If not provided, will default to 6379
redis_port: int = 6379


settings = Settings()
Loading

0 comments on commit 5e06f9f

Please sign in to comment.