Description
Description of the desired feature
Apache Arrow is an in-memory format that is starting to become a common exchange format between different libraries in Python and other programming languages. For example:
- Pandas 3.0 will use
pyarrow.string
instead ofobject
dtype for strings, see PDEP10, so we will eventually need to support PyArrow (at least for string dtypes) - Polars DataFrames can be zero-copy converted to Pyarrow via the
__dataframe__
protocol, see https://arrow.apache.org/docs/python/interchange_protocol.html
This issue is to track compatibility and support of different PyArrow data types in PyGMT:
Dtype | Implementation PR | Status | Notes |
---|---|---|---|
Numerical (uint/int/float) | #2774 | ✅ | |
String | #2933 | 🚧 | May require modifying the put_strings method that currently uses np.char.encode |
Date/Time | #2845 (pandas) and TODO (raw pyarrow) | 🚧 | May require modifying array_to_datetime that expects Python datetime or numpy-backed arrays, xref #242 and #3507 |
Duration | TODO | ❌ | https://arrow.apache.org/docs/13.0/python/generated/pyarrow.duration.html, wait for #2884 also |
Special case: geopandas.GeoDataFrame with PyArrow dtype columns |
TODO | ❌ | See #2774 (comment) |
GeoArrow geometry | TODO | ❌ | https://github.com/geoarrow/geoarrow-python |
Simplest way of integrating would be to just handle PyArrow-backed pandas.Dataframe
objects as above.
Alternatively, we can also discuss about using PyArrow as the internal array representation (which would make pyarrow
a hard dependency) since it may allow better interoperability across other Python libraries using Arrow, and this might be relevant for #1318 and #2731. My thought is to do this through the __dataframe__
protocol, see https://arrow.apache.org/docs/python/interchange_protocol.html
Further reading:
- https://voltrondata.com/codex/standards-over-silos#1-2-standardizing-on-arrow
- https://voltrondata.com/resources/dataframe-interoperability-python-pyarrow-enables-modular-workflows
- https://arrow.apache.org/docs/python/pandas.html#zero-copy-series-conversions
Are you willing to help implement and maintain this feature?
Yes, but help is welcome too!