Fast I/O and transformation tools for GeoParquet files using PyArrow and DuckDB.
📚 Full Documentation | Quick Start Tutorial
- Fast: Built on PyArrow and DuckDB for high-performance operations
- Comprehensive: Sort, partition, enhance, validate, and upload GeoParquet files
- Cloud-Native: Upload to S3, GCS, and Azure with parallel transfers
- Spatial Indexing: Add bbox, H3 hexagonal cells, KD-tree partitions, and hierarchical admin divisions
- Best Practices: Automatic optimization following GeoParquet 1.1 spec
- Flexible: CLI and Python API for any workflow
- Tested: Extensive test suite across Python 3.9-3.13 and all platforms
pip install geoparquet-ioSee the Installation Guide for other options (uv, from source) and requirements.
# Inspect file structure and metadata
gpio inspect myfile.parquet
# Check file quality and best practices
gpio check all myfile.parquet
# Add bounding box column for faster queries
gpio add bbox input.parquet output.parquet
# Sort using Hilbert curve for spatial locality
gpio sort hilbert input.parquet output_sorted.parquet
# Partition by admin boundaries
gpio partition admin buildings.parquet output_dir/ --dataset gaul --levels continent,countryFor more examples and detailed usage, see the Quick Start Tutorial and User Guide.
Contributions are welcome! See CONTRIBUTING.md for development setup, coding standards, and how to submit changes.
- Documentation: https://cholmes.github.io/geoparquet-io/
- PyPI: https://pypi.org/project/geoparquet-io/ (coming soon)
- Issues: https://github.com/cholmes/geoparquet-io/issues
Apache 2.0 - See LICENSE for details.