Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatically specifying shards and chunks #2572

Open
d-v-b opened this issue Dec 18, 2024 · 1 comment
Open

automatically specifying shards and chunks #2572

d-v-b opened this issue Dec 18, 2024 · 1 comment
Labels
enhancement New features or improvements

Comments

@d-v-b
Copy link
Contributor

d-v-b commented Dec 18, 2024

I think it would be great if zarr-python could automatically pick a smart shard shape and chunk shape for users, based on an array shape and a dtype (i.e., the stuff that we will know if a user is coming in with a numpy array). Good defaults would make a lot of users happy.

Off the top of my head, the following constraints should factor in to the automatic shard shape / chunk shape:

  • min / max size (in bytes)
  • min / max count
  • shape constraints. some examples:
    • chunks must tile the shard perfectly (non-configurable)
    • chunks should have 1 axis length that is fixed to a constant, other lengths can vary to satisfy other constraints

it might be useful to combine a size constraint to shards, and a mixed size / shape constraint to chunks, e.g. "chunks should be ~isotropic, divisible by a power of 2 on each size, inside a shard that is at most 100 MB"

and it's possible that these constraints should be configurable, via the global config, or via keyword arguments to array creation.

Any thoughts? @jbms if you have any tensorstore stories to share about this I would be very interested.

@jbms
Copy link

jbms commented Dec 18, 2024

TensorStore has ChunkLayout that allows some of these constraints to be specified: https://google.github.io/tensorstore/schema.html#chunk-layout

Some useful things that are missing include requiring that a given chunk dimension is a multiple of, or evenly divides, some number.

Also it would need extending to support variable chunking if that is desired.

@dstansby dstansby added the enhancement New features or improvements label Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features or improvements
Projects
None yet
Development

No branches or pull requests

3 participants