-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compression increases size? #6
Comments
Oops, I think the "in bits" should be "in bytes", so yeah I'm curious why it is expanding the size so much. This will depend on the distribution of values. The main use case is when there are <100 or <1000 distinct values in the dataset (I should clarify that in the docs). Would you be able to check how many distinct values are in this data? |
Oh, in that case it definitely will not perform well on this data! There are 7594235 unique values, which is 75% of the inputs. There are definitely patterns in the input, though: the data are quantized positions and velocities from a 3D N-body simulation. blosc (i.e. byte transpose) + |
I have added the following to the readme
Let me know if you come across any real datasets that are suitable. I'm also working on this related project: |
Thanks! |
I have an int32 dataset where simple_ans compression seems to increase the size, rather than decrease. Is this expected for some datasets, or am I "holding it wrong"?
Here's a minimal reproducer (data is on rusty):
I did notice that the README has this code:
The comment suggests that the size is measured in bits. However, the rest of that code (and the source of
size()
function) suggests it's in bytes. If it really is in bits, though, this certainly could be my issue.The text was updated successfully, but these errors were encountered: