Does changing the chunks of a zarr Array affect compression ratio? #1627
Answered
by
rabernat
paulshuker
asked this question in
Q&A
-
Thank you for any help on this |
Beta Was this translation helpful? Give feedback.
Answered by
rabernat
Jan 9, 2024
Replies: 1 comment 1 reply
-
In general, yes. Lossless compression fundamentally relies on the presence of repeated values in the data. In the limit of chunks with only one element, there can be no repeated values, and thus no compression. So generally speaking, as you make your chunks larger, you will have more opportunities for compression. The chunk shape can also make a big difference. To go beyond this very general statement, you would need to do some experiments with your actual dataset to see how compression ratio depends on chunk size and shape. |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
paulshuker
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In general, yes. Lossless compression fundamentally relies on the presence of repeated values in the data. In the limit of chunks with only one element, there can be no repeated values, and thus no compression. So generally speaking, as you make your chunks larger, you will have more opportunities for compression. The chunk shape can also make a big difference.
To go beyond this very general statement, you would need to do some experiments with your actual dataset to see how compression ratio depends on chunk size and shape.