Restore optimizations for `NDBuffer.all_equal` #2730

y4n9squared · 2025-01-18T20:38:39Z

Zarr 3.x has some performance regressions for certain write workloads (writing large chunks with floating point dtype).

This change modifies the implementation of NDBuffer.all_equal to be the same logic as Zarr 2.x's zarr.util.all_equals, which contains a number of important optimizations. A few mechanical changes were made to accomodate that the subroutine is now a method of NDBuffer rather than function.

This change is most impactful when writing large floating point chunks as the implementation of

np.all(np.isnan(self._data))

is significantly more efficient than calling

_data, other = np.broadcast(self.data, np.nan)
np.array_equal(_data, other, equal_nan=True))

since np.broadcast requires potentially a large allocation -- the size of `self.data -- and then np.array_equal needs to fetch double the number of cache lines.

On EC2 r7i.2xlarge:

In [20]: data = np.random.rand(512, 512, 8)

In [21]: %timeit np.all(np.isnan(data))
596 μs ± 179 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [22]: %%timeit
    ...: data_, other = np.broadcast_arrays(data, np.nan)
    ...: np.array_equal(data_, other, equal_nan=True)
    ...:
    ...:
2.66 ms ± 953 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

(Both numbers are faster on M3 Max but similar slowdown).

With low-latency stores (e.g. local SSD), this results in double-digit % speed-ups for the workload referenced in the Zarr V3 blog post:

import numpy as np
import zarr

za = zarr.create_array(
    /tmp/foo.zarr",
    shape=(512, 512, 512),
    chunks=(512, 512, 8),
    dtype=np.float64,
    overwrite=True,
)

arr = np.random.rand(512, 512, 512)

za[:] = arr

For higher latency stores, improvement is still dramatic (10%+) when chunks have high compression ratios (e.g. np.ones).

For arrays larger than 1 GB, improvement is even more pronounced.

Towards #2710

Zarr 3.x has some performance regressions for certain write workloads (writing large chunks with floating point dtype). This change modifies the implementation of `NDBuffer.all_equal` to be the same logic as Zarr 2.x's `zarr.util.all_equals`, which contains a number of important optimizations. A few mechanical changes were made to accomodate that the subroutine is now a method of `NDBuffer` rather than function. This change is most impactful when writing large floating point chunks as the implementation of ```python np.all(np.isnan(self._data)) ``` is significantly more efficient than calling ```python _data, other = np.broadcast(self.data, np.nan) np.array_equal(_data, other, equal_nan=True)) ``` since `np.broadcast` requires potentially a large allocation -- the size of `self.data -- and then np.array_equal needs to fetch double the number of cache lines. On EC2 r7i.2xlarge: ``` In [20]: data = np.random.rand(512, 512, 8) In [21]: %timeit np.all(np.isnan(data)) 596 μs ± 179 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) In [22]: %%timeit ...: data_, other = np.broadcast_arrays(data, np.nan) ...: np.array_equal(data_, other, equal_nan=True) ...: ...: 2.66 ms ± 953 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` (Both numbers are faster on M3 Max but similar slowdown). With low-latency stores (e.g. local SSD), this results in double-digit % speed-ups for the workload referenced in the Zarr V3 blog post: ``` import numpy as np import zarr za = zarr.create_array( /tmp/foo.zarr", shape=(512, 512, 512), chunks=(512, 512, 8), dtype=np.float64, overwrite=True, ) arr = np.random.rand(512, 512, 512) za[:] = arr ``` For higher latency stores, improvement is still dramatic (10%+) when chunks have high compression ratios (e.g. np.ones). For arrays larger than 1 GB, improvement is even more pronounced.

d-v-b · 2025-01-19T11:06:00Z

can we get a test for each conditional branch?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Restore optimizations for `NDBuffer.all_equal` #2730

Restore optimizations for `NDBuffer.all_equal` #2730

Uh oh!

y4n9squared commented Jan 18, 2025 •

edited

Loading

Uh oh!

d-v-b commented Jan 19, 2025

Uh oh!

Uh oh!

Uh oh!

Restore optimizations for NDBuffer.all_equal #2730

Are you sure you want to change the base?

Restore optimizations for NDBuffer.all_equal #2730

Uh oh!

Conversation

y4n9squared commented Jan 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

d-v-b commented Jan 19, 2025

Uh oh!

Uh oh!

Restore optimizations for `NDBuffer.all_equal` #2730

Restore optimizations for `NDBuffer.all_equal` #2730

y4n9squared commented Jan 18, 2025 •

edited

Loading