Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content Addressable Storage #610

Open
sjperkins opened this issue Jan 23, 2025 · 1 comment
Open

Content Addressable Storage #610

sjperkins opened this issue Jan 23, 2025 · 1 comment

Comments

@sjperkins
Copy link

I've done some local tests committing the same dataset to a local icechunk repo in multiple commits.

This seems to increase the repo size linearly by the number of commits x the dataset size.

I guess this is because Content Addressable Storage https://docs.earthmover.io/concepts/version-control-system#content-addressable-chunk-storage isn't implemented.

Will icechunk implement CAS in future?

@rabernat
Copy link
Contributor

rabernat commented Jan 23, 2025

Thanks for your question @sjperkins! You're correct that this is the current expected behavior of Icechunk.

The most immediate way we can address the growth in the size of the repo is via expiration of old versions and subsequent garbage collection of expired chunks. This will be implemented soon.

In developing Arraylake, we realized that there were some tricky challenges around CAS and garbage collection...basically it's hard to know if a CAS chunk is ever safe to delete, because they could be written at any moment. Moreover, we looked at over 1 PB of existing customer data and determined that CAS was only saving 1% of storage. (So your example is very artificial in terms of real-world usage.)

So it is conceivable that we may find a way to bring CAS back at some point. But this is not on the near-term roadmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants