-
-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
behavior of is_total_slice
for boundary chunks
#757
Comments
Yeah that seems reasonable |
I made some tests and I think there is actually no way to reach the desired behavior of One possible solution would be to add a new (maybe optional) parameter to In this case I would like to work on the problem. |
@fAndreuzzi you might want to pick up from my efforts here: https://github.com/d-v-b/zarr-python/tree/boundary_chunk_optimization The strategy I'm taking is to intercept the arguments to |
@d-v-b Thank you very much for the hints! |
Just a question to check whether I understood the key idea of If we consider the example provided by @d-v-b, I would expect the result of |
This also affects reads. For e.g. when dask chunks exactly overlap with Zarr chunks, it should be possible to decode directly to a memory buffer. Instead because It should be easy to tell An example profile in which 18 chunks take the fast path, and 9 chunks are uselessly copied in the last line.
|
I filed a related issue recently as well |
Do folks think addressing this is a bugfix, or an enhancement? |
i think it's a bug if we are still reading boundary chunk unnecessarily, but i'm not sure if this is happening in v3 |
Uses `slice(..., None)` to indicate that a `chunk_selection` ends at the boundary of the current chunk. Also does so for a last chunk that is shorter than the chunk size. `is_total_slice` now understands this convention, and correctly detects boundary chunks as total slices. Closes zarr-developers#757
Uses `slice(..., None)` to indicate that a `chunk_selection` ends at the boundary of the current chunk. Also does so for a last chunk that is shorter than the chunk size. `is_total_slice` now understands this convention, and correctly detects boundary chunks as total slices. Closes zarr-developers#757
Uses `slice(..., None)` to indicate that a `chunk_selection` ends at the boundary of the current chunk. Also does so for a last chunk that is shorter than the chunk size. `is_total_slice` now understands this convention, and correctly detects boundary chunks as total slices. Closes zarr-developers#757
* Skip reads when completely overwriting boundary chunks Uses `slice(..., None)` to indicate that a `chunk_selection` ends at the boundary of the current chunk. Also does so for a last chunk that is shorter than the chunk size. `is_total_slice` now understands this convention, and correctly detects boundary chunks as total slices. Closes #757 * normalize in codec_pipeline * Revert "normalize in codec_pipeline" This reverts commit 234431cd6efb661c53e2a832a0e4ea4dca772c1b. * Partially Revert "Skip reads when completely overwriting boundary chunks" This reverts commit edbba37. * Different approach * fix bug * add oindex property test * more complex oindex test * cleanup * more oindex * Add changelog entry * [revert] note * fix for numpy 1.25 --------- Co-authored-by: Davis Bennett <[email protected]>
Re-opening because I haven't looked at whether the read path is still affected. |
The
is_total_slice
function is used to determine whether a selection corresponds to an entire chunk (is_total_slice
->True
) or not (is_total_slice
->False
).I noticed that
is_total_slice
is alwaysFalse
for "partial chunks", i.e. chunks that are not filled by the array:Which prints this:
Although the last selection is not the size of a full chunk, it is "total" with respect to the output of that selection in the array.
A direct consequence of this behavior is unnecessary chunk loading when performing array assignments -- zarr uses the result of
is_total_slice
to decide whether to load existing chunk data or not. Becauseis_total_slice
is alwaysFalse
for partial chunks, zarr always loads boundary chunks during assignment.A solution to this would be to augment the
is_total_slice
function to account for partial chunks. I'm not sure at the moment how to do this exactly, but it shouldn't be hard. Happy to bring forth a PR if people agree that this is an issue worth addressing.The text was updated successfully, but these errors were encountered: