Skip to content

Support PyArrow arrays and dataframes #2800

Open
@weiji14

Description

@weiji14

Description of the desired feature

Apache Arrow is an in-memory format that is starting to become a common exchange format between different libraries in Python and other programming languages. For example:

This issue is to track compatibility and support of different PyArrow data types in PyGMT:

Dtype Implementation PR Status Notes
Numerical (uint/int/float) #2774
String #2933 🚧 May require modifying the put_strings method that currently uses np.char.encode
Date/Time #2845 (pandas) and TODO (raw pyarrow) 🚧 May require modifying array_to_datetime that expects Python datetime or numpy-backed arrays, xref #242 and #3507
Duration TODO https://arrow.apache.org/docs/13.0/python/generated/pyarrow.duration.html, wait for #2884 also
Special case: geopandas.GeoDataFrame with PyArrow dtype columns TODO See #2774 (comment)
GeoArrow geometry TODO https://github.com/geoarrow/geoarrow-python

Simplest way of integrating would be to just handle PyArrow-backed pandas.Dataframe objects as above.

Alternatively, we can also discuss about using PyArrow as the internal array representation (which would make pyarrow a hard dependency) since it may allow better interoperability across other Python libraries using Arrow, and this might be relevant for #1318 and #2731. My thought is to do this through the __dataframe__ protocol, see https://arrow.apache.org/docs/python/interchange_protocol.html

Further reading:

Are you willing to help implement and maintain this feature?

Yes, but help is welcome too!

Metadata

Metadata

Assignees

Labels

feature requestNew feature wantedhelp wantedHelping hands are appreciatedlongtermLong standing issues that need to be resolved

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions