Skip to content

cholmes/geoparquet-io

geoparquet-io

Tests Python Version License Code style: ruff

Fast I/O and transformation tools for GeoParquet files using PyArrow and DuckDB.

📚 Full Documentation | Quick Start Tutorial

Features

  • Fast: Built on PyArrow and DuckDB for high-performance operations
  • Comprehensive: Sort, partition, enhance, validate, and upload GeoParquet files
  • Cloud-Native: Upload to S3, GCS, and Azure with parallel transfers
  • Spatial Indexing: Add bbox, H3 hexagonal cells, KD-tree partitions, and hierarchical admin divisions
  • Best Practices: Automatic optimization following GeoParquet 1.1 spec
  • Flexible: CLI and Python API for any workflow
  • Tested: Extensive test suite across Python 3.9-3.13 and all platforms

Installation

pip install geoparquet-io

See the Installation Guide for other options (uv, from source) and requirements.

Quick Start

# Inspect file structure and metadata
gpio inspect myfile.parquet

# Check file quality and best practices
gpio check all myfile.parquet

# Add bounding box column for faster queries
gpio add bbox input.parquet output.parquet

# Sort using Hilbert curve for spatial locality
gpio sort hilbert input.parquet output_sorted.parquet

# Partition by admin boundaries
gpio partition admin buildings.parquet output_dir/ --dataset gaul --levels continent,country

For more examples and detailed usage, see the Quick Start Tutorial and User Guide.

Contributing

Contributions are welcome! See CONTRIBUTING.md for development setup, coding standards, and how to submit changes.

Links

License

Apache 2.0 - See LICENSE for details.

About

A collection of tools for GeoParquet, using PyArrow and DuckDB

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages