Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array conversion methods #9823

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Array conversion methods #9823

wants to merge 9 commits into from

Conversation

slevang
Copy link
Contributor

@slevang slevang commented Nov 25, 2024

This implements a convenience method to convert data between duck array types, and another to check the current type of your data. Basically identical to cupy-xarray's as_cupy()/is_cupy() methods, but exposed in a more general way.

The signature I have here takes a callable plus any kwargs, so it's quite flexible:

ds.as_array_type(cp.asarray)
ds.as_array_type(jnp.from_dlpack)
ds.as_array_type(jnp.asarray, device=jax.devices("gpu")[0])
ds.as_array_type(pint.Quantity, units="m/s")

Then:

ds.is_array_type(cp.ndarray) # -> True

I'm not sure about the naming. There are also other ways we could go about this, e.g. string matching for supported arrays.

Follow up to #9798.

@TomNicholas TomNicholas added API design topic-arrays related to flexible array support array API standard Support for the Python array API standard labels Nov 25, 2024
@slevang
Copy link
Contributor Author

slevang commented Dec 3, 2024

@shoyer @dcherian any further thoughts on this API?

@TomNicholas
Copy link
Member

I just want to note that conceptually this problem

ds.is_array_type(cp.ndarray) # -> True

is effectively a static typing check. Whilst in general an xarray Dataset arguably is a generic with type Dataset[Hashable, NDArrayLike], here you want to know if the Dataset you actually have is the more specific type Dataset[Hashable, cp.ndarray] or not.

That's a similar problem to the one that comes up in VirtualiZarr, where users get confused by how there is no top-level "VirtualDataset" type, as instead a "virtual dataset" is any Dataset of type Dataset[Hashable, ManifestArray | np.ndarray].

@shoyer
Copy link
Member

shoyer commented Feb 12, 2025

We talked about this at the Xarray developers meeting today.

One concern is that ds.as_array_type(cp.asarray) is a special case of a more general operator, which maps over all bare ndarray objects. The way to spell this currently is with apply_ufunc, e.g., xarray.apply_ufunc(cp.asarray, ds), which works on arbitrary xarray data structures.

I'm less sure that we need the capability to check all arrays to see if they match a given array type. We don't have that for dtypes, which suggests it may not be necessary. That said, I do think it would be nice to be able to generically flatten the data from an arbitrary xarray object into a list, to support something like all(isinstance(x, cp.ndarray) for x in xarray.flatten_data(ds)).

I agree that these are not the most obvious APIs for working with duck array data, in particular because duck array conversion can only very loosely be interpreted as a numpy universal function. We could consider adding new methods for this, but I would start with documenting what we already have, e.g., in the documentation section on working with numpy-like arrays.

@slevang
Copy link
Contributor Author

slevang commented Feb 12, 2025

Yeah that all makes sense. apply_ufunc is an equally elegant if not obvious way to achieve this, which should be documented. Agree that the is method is quite a bit less important. FWIW cupy-xarray was the inspiration in putting this together and I copied its API almost verbatim.

Should we close in favor of a simple documentation PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design array API standard Support for the Python array API standard topic-arrays related to flexible array support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants