Update and document GPU buffer handling #2751

TomAugspurger · 2025-01-22T22:58:27Z

This updates how we handle GPU buffers. See the new docs page for a simple example.

The basic idea, as discussed in #2574 is to use host buffers for all metadata objects and device buffers for data.

Zarr has two types of buffers: plain buffers (used for a stream of bytes) and ndbuffers (used for bytes that represent ndarrays). To make it easier for users, I've added a new config option zarr.config.enable_gpu() that can be used to update those both. If we need additional customizations in the future (like gpu-accelerated codecs), we can add them here.

I've opened this as a draft for now. I want to look a bit more at exactly when data is copied between the host and device. Right now, it looks like the default Zarr v3 codec pipeline will automatically transfer bytes to the host in ZstdCodec._encode_single:

zarr-python/src/zarr/codecs/zstd.py

Line 88 in 0c154c3

    
           as_numpy_array_wrapper, self._zstd_codec.encode, chunk_bytes, chunk_spec.prototype

This PR ensures that you end up with GPU bytes when reading data, but all data I/O and encoding / decoding still happens on the CPU. I think in the future we'll be able to do more work on the GPU while avoiding copies back to the host.

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.rst
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

Clsoes #2574

This updates how we handle GPU buffers. See the new docs page for a simple example. The basic idea, as discussed in ..., is to use host buffers for all metadata objects and device buffers for data. Zarr has two types of buffers: plain buffers (used for a stream of bytes) and ndbuffers (used for bytes that represent ndarrays). To make it easier for users, I've added a new config option `zarr.config.enable_gpu()` that can be used to update those both. If we need additional customizations in the future, we can add them here.

TomAugspurger · 2025-01-24T16:26:13Z

I've opened this as a draft for now. I want to look a bit more at exactly when data is copied between the host and device. Right now, it looks like the default Zarr v3 codec pipeline will automatically transfer bytes to the host in ZstdCodec._encode_single:

Still looking at this, but there is currently a copy to the host for running Zstd (the zarr.codecs.zstd.ZstdCodec codec explicitly asks for a NumPy array, which (silently) copies from device to host if necessary). I'll probably leave codecs for a separate issue / PR, but I want to at least get an example working where everything stays on the GPU before moving this out of draft.

cc @madsbk.

TomAugspurger · 2025-01-30T21:47:15Z

This should be ready for review now.

This fixes the issue with storing metadata in host memory, rather than device memory.

It adds a convenience function for setting up everything to use GPUs. Currently, that just sets the buffer prototype. In the future, I think it could also update the default codecs to use GPU codecs. https://gist.github.com/TomAugspurger/ba13bc29b27f587ae4709ac7f30d89c8 has a snippet hacking together an example of what that future might look like, but IIUC @akshaysubr is working on new APIs for compression / codecs, so IMO it makes sense to address that separately. For now, I've left a note that codecs will run on the CPU by default.

madsbk

Looks good, I only have a minor suggestion

tests/test_api.py

akshaysubr

This looks good to me, approving!

TomAugspurger · 2025-02-05T12:34:34Z

@dstansby would you be able to confirm that I did the changelog correctly? Happy to add a note to the contributing guide like

Pull requests should usually include a note for the changelog as a towncrier news fragment. Use the GitHub issue number along with the type of your change (feature, bugfix, doc, removal, misc)
towncrier create
You'll be prompted to input the issue number, fix type, and text of the changelog. See the towncrier docs for more.

dstansby · 2025-02-05T12:36:49Z

Release notes look good to me 👍 Adding that text in our contributor guide would be awesome (and TIL that you can use towncrier create, I've been doing them by hand until now!)

TomAugspurger · 2025-02-05T17:36:11Z

Should be good to go.

TomAugspurger

@jhamman or would @d-v-b would you be able to take a look at this sometime? Thanks!

TomAugspurger · 2025-02-13T12:41:59Z

src/zarr/core/config.py

+        """
+        Configure Zarr to use GPUs where possible.
+        """
+        return self.set(


I believe this shows it's covered: https://app.codecov.io/github/zarr-developers/zarr-python/blob/TomAugspurger%2Fzarr-python%3Atom%2Ffix%2Fgpu/src%2Fzarr%2Fcore%2Fconfig.py#L66

Maybe the warning comes from the CPU-only coverage run?

Need patches from rapidsai/kvikio#646 and zarr-developers/zarr-python#2751.

TomAugspurger changed the title ~~Update GPU handling~~ Update and document GPU buffer handling Jan 22, 2025

fixed doc

9884f36

TomAugspurger added 2 commits January 30, 2025 13:18

Merge remote-tracking branch 'upstream/main' into tom/fix/gpu

7bc90b9

Fixup

8371513

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 30, 2025

changelog

56cc00c

TomAugspurger marked this pull request as ready for review January 30, 2025 21:44

TomAugspurger added 2 commits January 30, 2025 14:07

doctest, skip

60ba16a

removed not gpu

cb67094

madsbk approved these changes Jan 31, 2025

View reviewed changes

tests/test_api.py Show resolved Hide resolved

akshaysubr approved these changes Jan 31, 2025

View reviewed changes

TomAugspurger added 2 commits January 31, 2025 04:09

assert that the type matches

2b70d0d

Merge remote-tracking branch 'upstream/main' into tom/fix/gpu

1e94ce2

TomAugspurger added 2 commits February 5, 2025 05:59

Added changelog notes

7c31bc2

Merge remote-tracking branch 'upstream/main' into tom/fix/gpu

424329f

github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Feb 5, 2025

Merge branch 'main' into tom/fix/gpu

d5ab1d7

TomAugspurger commented Feb 13, 2025

View reviewed changes

Merge branch 'main' into tom/fix/gpu

c0b118f

d-v-b approved these changes Feb 14, 2025

View reviewed changes

dcherian approved these changes Feb 14, 2025

View reviewed changes

dcherian enabled auto-merge (squash) February 14, 2025 15:39

dcherian merged commit 24ef221 into zarr-developers:main Feb 14, 2025
29 of 30 checks passed

weiji14 added a commit to xarray-contrib/cupy-xarray that referenced this pull request Mar 11, 2025

Install nightly version of kvikio=25.04.00a and zarr>=3.0.5

7dd78e9

Need patches from rapidsai/kvikio#646 and zarr-developers/zarr-python#2751.

weiji14 mentioned this pull request Mar 11, 2025

Kvikio backend entrypoint with Zarr v3 xarray-contrib/cupy-xarray#70

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update and document GPU buffer handling #2751

Update and document GPU buffer handling #2751

TomAugspurger commented Jan 22, 2025 •

edited

Loading

TomAugspurger commented Jan 24, 2025

TomAugspurger commented Jan 30, 2025

madsbk left a comment

akshaysubr left a comment

TomAugspurger commented Feb 5, 2025 •

edited

Loading

dstansby commented Feb 5, 2025

TomAugspurger commented Feb 5, 2025

TomAugspurger left a comment

TomAugspurger Feb 13, 2025

Update and document GPU buffer handling #2751

Update and document GPU buffer handling #2751

Conversation

TomAugspurger commented Jan 22, 2025 • edited Loading

TomAugspurger commented Jan 24, 2025

TomAugspurger commented Jan 30, 2025

madsbk left a comment

Choose a reason for hiding this comment

akshaysubr left a comment

Choose a reason for hiding this comment

TomAugspurger commented Feb 5, 2025 • edited Loading

dstansby commented Feb 5, 2025

TomAugspurger commented Feb 5, 2025

TomAugspurger left a comment

Choose a reason for hiding this comment

TomAugspurger Feb 13, 2025

Choose a reason for hiding this comment

TomAugspurger commented Jan 22, 2025 •

edited

Loading

TomAugspurger commented Feb 5, 2025 •

edited

Loading