Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Block-based Compression Early Abort for Incompressible Data in gensquashfs #114

Open
wychen opened this issue Apr 4, 2023 · 0 comments

Comments

@wychen
Copy link

wychen commented Apr 4, 2023

gensquashfs currently retains the original data if the compressed output is larger than the source. However, performing heavy-duty compression on incompressible data and then discarding it may be wasteful. I propose adding a command line option to gensquashfs that enables a quick entropy measurement before performing compression. If a block is deemed incompressible, we can simply keep the original data without wasting computational resources on compression.

We could use a fast compression method, such as zstd level 1, to gauge the entropy. In this case, when using the default xz level 6, zstd level 1 introduces less than 2% of computational overhead. This approach would provide a net gain if the source files contain at least 2% of incompressible blocks, which is not an unreasonable scenario.

Alternative methods, such as file-based skipping mechanisms with filename matching or file type detection, may be less accurate. Specifically, files containing mixed compressibility resources, such as PDFs with both text (compressible) and JPEG images (not compressible), or uncompressed tar files or VM images containing various file types, could benefit from a more granular block-based approach.

This idea is inspired by the ZFS LZ4 early abort mechanism, although the requirements and trade-offs in our context may be different. For reference, I have filed a similar issue on the squashfs-tools repository at plougher/squashfs-tools#240.

I'm happy to refine my local prototype and send a PR, but I'd like to ensure that this feature aligns with the project's direction first. Thank you for your time and consideration. I'm looking forward to hearing your thoughts on this proposal and the potential advantages it could bring to squashfs-tools-ng and the community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant