Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace bzip2 #143

Open
Artoria2e5 opened this issue Sep 1, 2023 · 3 comments
Open

Replace bzip2 #143

Artoria2e5 opened this issue Sep 1, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@Artoria2e5
Copy link

Artoria2e5 commented Sep 1, 2023

bzip2 is ancient. It is slow to decompress and does not provide the best compression ratio. Most projects have switched to something else; Fedora's comparison may be useful here. Since this is a new version of FAH, it might finally be time to also change the compression on the tarball.

Some timing and sizing data on fahcore-22-windows-64bit-release-0.0.20. The -T0 in xz enables multithreading; both single-threaded and multi-thread decomp are tested.

$ time bzip2 -dk fahcore-22-windows-64bit-release-0.0.20.tar.bz2
real    0m13.048s
user    0m0.000s
sys     0m0.000s

$ xz -T0 -k -v --x86 --lzma2=preset=6 fahcore-22-windows-64bit-release-0.0.20.tarfahcore-22-windows-64bit-release-0.0.20.tar (1/1)
  100 %       116.5 MiB / 236.3 MiB = 0.493    15 MiB/s       0:15

$ ls -l fah*
-rw-r--r-- 1 arthu arthu 247808000 Sep  1 15:07 fahcore-22-windows-64bit-release-0.0.20.tar
-rw-r--r-- 1 arthu arthu 156444974 Sep  1 15:08 fahcore-22-windows-64bit-release-0.0.20.tar.bz2
-rw-r--r-- 1 arthu arthu 122121176 Sep  1 15:07 fahcore-22-windows-64bit-release-0.0.20.tar.xz

$ rm  fahcore-22-windows-64bit-release-0.0.20.tar

$ time xz -dk fahcore-22-windows-64bit-release-0.0.20.tar.xz
real    0m7.548s
user    0m0.000s
sys     0m0.000s

$ rm  fahcore-22-windows-64bit-release-0.0.20.tar

$ time xz -T0 -dk fahcore-22-windows-64bit-release-0.0.20.tar.xz
real    0m1.418s
user    0m0.000s
sys     0m0.000s

$ zstd fahcore-22-windows-64bit-release-0.0.20.tar
fahcore-22-windows-64bit-release-0.0.20.tar : 59.84%   (   236 MiB =>    141 MiB, fahcore-22-windows-64bit-release-0.0.20.tar.zst)

$ ls -l fahcore-22-windows-64bit-release-0.0.20.tar.zst
-rw-r--r-- 1 arthu arthu 148296926 Sep  1 15:07 fahcore-22-windows-64bit-release-0.0.20.tar.zst

$ rm  fahcore-22-windows-64bit-release-0.0.20.tar

$ time zstd -d fahcore-22-windows-64bit-release-0.0.20.tar.zst
fahcore-22-windows-64bit-release-0.0.20.tar.zst: 247808000 bytes

real    0m0.384s
user    0m0.156s
sys     0m0.125s

Both zstd and xz with BCJ compresses better than bzip2 and decompresses faster. Zstd is slightly smaller but very fast (34×). Xz is significantly smaller but only ~70% faster single-threaded.


Slight issue with zstd is that it requires an pypi module python-zstandard; xz is covered by the builtin lzma module. Cbang, which is currently used to handle tar and bz2, also does not have zstd nor xz support in https://github.com/CauldronDevelopmentLLC/cbang/blob/master/src/cbang/iostream/CompressionFilter.h.

@jcoffland jcoffland added the enhancement New feature or request label Sep 1, 2023
@marcosfrm
Copy link
Contributor

cbang already links three compression libraries (zlib, libbz2, liblz4). If another library is to be added, it should be libzstd, as it renders the three currently supported algorithms (plus lzma) obsolete in a way, and in addition can be used for HTTP content-encoding.

@Artoria2e5
Copy link
Author

Artoria2e5 commented Aug 27, 2024

I do not believe zstd renders LZMA obsolete, because it does not cover the "compress once but do it really well" niche enough. That niche used to be bzip2's land, but lzma now do it both faster and better. Zstd is about being fast, both ways, while being also better than before. And yes, HTTP is about being fast both ways too.

https://quixdb.github.io/squash-benchmark/#ratio-vs-decompression (I do believe the zstd times are a little broken here!)

[that said, zstd is smaller than bzip2 here too]

@marcosfrm
Copy link
Contributor

zstd is good overall, delivering compression similar to lz4 at low levels, equivalent to gzip and bzip2 at intermediate levels, and close to lzma at high levels. Additionally, it's a well-maintained codebase -- something to consider after the xz backdoor fiasco.

By the way, zstd also has a -T0 option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants