Skip to content

Commit

Permalink
Added project workflows, based on Xee.
Browse files Browse the repository at this point in the history
Added CI, lint, and pypi publish workflows.
  • Loading branch information
alxmrs committed Mar 10, 2024
1 parent b2edd8d commit 93e8baa
Show file tree
Hide file tree
Showing 5 changed files with 269 additions and 16 deletions.
68 changes: 68 additions & 0 deletions .github/workflows/ci-build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Taken from Xee and minimally modified: https://github.com/google/Xee/blob/main/.github/workflows/ci-build.yml
#
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
name: ci

on:
# Triggers the workflow on push or pull request events but only for the main branch
push:
branches: [ main ]
pull_request:
branches: [ main ]
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

jobs:
build:
name: "python ${{ matrix.python-version }} tests"
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: [
"3.8",
"3.9",
"3.10",
"3.11",
"3.12",
]
permissions:
id-token: write # This is required for requesting the JWT.
steps:
- name: Cancel previous
uses: styfle/[email protected]
with:
access_token: ${{ github.token }}
if: ${{github.ref != 'refs/head/main'}}
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
- name: Install qarray
run: |
pip install -e .[test]
- uses: 'actions/checkout@v4'
- id: 'auth'
name: 'Authenticate to Google Cloud'
uses: 'google-github-actions/auth@v1'
with:
service_account: ${{ secrets.SERVICE_ACCOUNT }}
workload_identity_provider: ${{ secrets.WORKLOAD_IDENTITY_PROVIDER }}
- name: Run unit tests
run: |
pytest qarray
62 changes: 62 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Taken from Xee and minimally modified: https://github.com/google/Xee/blob/main/.github/workflows/lint.yml
#
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
name: lint

on:
# Triggers the workflow on push or pull request events but only for the main branch
push:
branches: [ main ]
pull_request:
branches: [ main ]
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

jobs:
build:
name: "python ${{ matrix.python-version }} lint"
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
steps:
- name: Cancel previous
uses: styfle/[email protected]
with:
access_token: ${{ github.token }}
if: ${{github.ref != 'refs/head/main'}}
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Get pip cache dir
id: pip-cache
run: |
python -m pip install --upgrade pip wheel
echo "::set-output name=dir::$(pip cache dir)"
- name: pip cache
uses: actions/cache@v2
with:
path: ${{ steps.pip-cache.outputs.dir }}
key: ${{ runner.os }}-pip-${{ hashFiles('**/setup.py') }}
- name: Install linter
run: |
pip install pyink
- name: Lint with pyink
run: |
pyink --check .
97 changes: 97 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Taken from Xee and minimally modified: https://github.com/google/Xee/blob/main/.github/workflows/publish.yml
#
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Publish to PyPi

on:
release:
types: [published]

jobs:
build-artifacts:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/[email protected]
with:
python-version: 3.9

- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install setuptools setuptools-scm wheel twine check-manifest
- name: Build tarball and wheels
run: |
git clean -xdf
git restore -SW .
python -m build --sdist --wheel .
- name: Check built artifacts
run: |
python -m twine check dist/*
pwd
- uses: actions/upload-artifact@v2
with:
name: releases
path: dist

test-built-dist:
needs: build-artifacts
runs-on: ubuntu-latest
steps:
- uses: actions/[email protected]
name: Install Python
with:
python-version: 3.9
- uses: actions/download-artifact@v2
with:
name: releases
path: dist
- name: List contents of built dist
run: |
ls -ltrh
ls -ltrh dist
- name: Publish package to TestPyPI
if: github.event_name == 'push'
uses: pypa/[email protected]
with:
user: __token__
password: ${{ secrets.TESTPYPI_TOKEN }}
repository_url: https://test.pypi.org/legacy/
verbose: true

- name: Check uploaded package
if: github.event_name == 'push'
run: |
sleep 3
python -m pip install --upgrade pip
python -m pip install --extra-index-url https://test.pypi.org/simple --upgrade de
python -c "import qarray; print(qarray.__version__)"
upload-to-pypi:
needs: test-built-dist
if: github.event_name == 'release'
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@v2
with:
name: releases
path: dist
- name: Publish package to PyPI
uses: pypa/[email protected]
with:
user: __token__
password: ${{ secrets.PYPI_TOKEN }}
verbose: true
58 changes: 42 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,21 +33,21 @@ df = qr.read_xarray(ds)
df.head()
```

Succinctly, we "pivot" Xarray Datasets to treat them like
tables so we can run SQL queries against them.
Succinctly, we "pivot" Xarray Datasets to treat them like tables so we can run
SQL queries against them.

## Why build this?

A few reasons:

* Even though SQL is the lingua franca of data, scientific datasets are often
inaccessible to non-scientists.
* Joining tabular data with raster data is common yet difficult. It
could be easy.
* There are many cloud-native, Xarray-openable datasets,
* Joining tabular data with raster data is common yet difficult. It could be
easy.
* There are many cloud-native, Xarray-openable datasets,
from [Google Earth Engine](https://github.com/google/Xee)
to [Pangeo Forge](https://pangeo-forge.org/). Wouldn’t it be great if these
were also SQL-accessible? How can the bridge be built with minimal effort?
were also SQL-accessible? How can the bridge be built with minimal effort?

This is a light-weight way to prove the value of the interface.

Expand All @@ -63,14 +63,14 @@ That's it!

## Why does this work?

Underneath Xarray, Dask, and Pandas, there are NumPy arrays. These are
paged in chucks and represented contiguously in memory. It is only a
matter of metadata that breaks them up into ndarrays. `to_dataframe()`
just changes this metadata (via a `ravel()`/`reshape()`), back into a
column amenable to a DataFrame.
Underneath Xarray, Dask, and Pandas, there are NumPy arrays. These are paged in
chucks and represented contiguously in memory. It is only a matter of metadata
that breaks them up into ndarrays. `to_dataframe()`
just changes this metadata (via a `ravel()`/`reshape()`), back into a column
amenable to a DataFrame.

There is added overhead from duplicating dimensions as columns, which
we see as worth the convenience of DataFrames.
There is added overhead from duplicating dimensions as columns, which we see as
worth the convenience of DataFrames.

## What are the current limitations?

Expand All @@ -87,10 +87,36 @@ supported ([geopandas/dask-geopandas#72](https://github.com/geopandas/dask-geopa
## What would a deeper integration look like?

I have a few ideas so far. One approach involves applying operations directly on
Xarray Datasets. This approach is being pursued
Xarray Datasets. This approach is being pursued
[here](https://github.com/google/weather-tools/tree/main/xql), as `xql`.

Deeper still: I was thinking we could make a [virtual](https://fsspec.github.io/kerchunk/)
Deeper still: I was thinking we could make
a [virtual](https://fsspec.github.io/kerchunk/)
filesystem for parquet that would internally map to Zarr. Raster-backed virtual
parquet would open up integrations to numeroustools like dask, pyarrow, duckdb,
and BigQuery. More thoughts on this in [#4](https://github.com/alxmrs/qarray/issues/4).
and BigQuery. More thoughts on this
in [#4](https://github.com/alxmrs/qarray/issues/4).

## License

```
Copyright 2024 Alexander Merose
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```

Some sources are re-distributed from Google LLC
via https://github.com/google/Xee (also Apache-2.0 License) with and without
modification. These files are subject to the original copyright; they include
the original license header comment as well as a note to indicate
modifications (when appropriate).
File renamed without changes.

0 comments on commit 93e8baa

Please sign in to comment.