Skip to content

Commit

Permalink
Update main README
Browse files Browse the repository at this point in the history
  • Loading branch information
kylebarron committed Jun 20, 2024
1 parent 581f936 commit 9e802d4
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 3 deletions.
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
# STAC-geoparquet

Convert STAC items to GeoParquet.
Convert [STAC](https://stacspec.org/en) items between JSON, [GeoParquet](https://geoparquet.org/), [pgstac](https://github.com/stac-utils/pgstac), and [Delta Lake](https://delta.io/).

## Purpose

This library helps convert [STAC Items](https://github.com/radiantearth/stac-spec/blob/master/overview.md#item-overview) to [GeoParquet](https://github.com/opengeospatial/geoparquet). While STAC Items are commonly distributed as individual JSON files on object storage or through a [STAC API](https://github.com/radiantearth/stac-api-spec), STAC GeoParquet allows users to access a large number of STAC items in bulk without making repeated HTTP requests.
The STAC spec defines a JSON-based schema. But it can be hard to manage and search through many millions of STAC items in JSON format. For one, JSON is very large on disk. And you need to parse the entire JSON data into memory to extract just a small piece of information, say the `datetime` and one `asset` of an Item.

GeoParquet can be a good complement to JSON for many bulk-access and analytic use cases. While STAC Items are commonly distributed as individual JSON files on object storage or through a [STAC API](https://github.com/radiantearth/stac-api-spec), STAC GeoParquet allows users to access a large number of STAC items in bulk without making repeated HTTP requests.

For analytic questions like "find the items in the Sentinel-2 collection in June 2024 over New York City with cloud cover of less than 20%" it can be much, much faster to find the relevant data from a GeoParquet source than from JSON, because GeoParquet needs to load only the relevant columns for that query, not the full data.

## Usage

Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ nav:
- "index.md"
- API Reference:
- api/geopandas.md
- api/viz.md
# - api/pgstac.md

watch:
- stac_geoparquet
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ version-file = "stac_geoparquet/_version.py"

[project.optional-dependencies]
docs = [
"black",
"griffe-inherited-docstrings",
"mike>=2",
"mkdocs-material[imaging]>=9.5",
Expand Down

0 comments on commit 9e802d4

Please sign in to comment.