interesting compression algorithms + policy #1633

ThomasWaldmann · 2016-09-23T22:51:25Z

There are often requests to add some new and great compression algorithm X.

This ticket is to collect such ideas and also to specify the adoption policy for X:

if X is adding an external requirement, X must be portable and widespread (Linux, BSD, OS X, cygwin, windows, 32/64bit, x86/arm/..., LE/BE) in stable distributions. For brand new stuff, this is often not the case and we would create a major problem for packagers if we would require it.
we are not keen on "bundling" 3rd party (source) code into borg either and some distributions have policies to remove such stuff anyway
X must add tangible value (if X is just as good as or not much better than zlib (or other integrated methods), why should we add X?)
X itself must be stable (format) and bugfree (code) enough, breaking changes break your backups
long term maintenance must be rather likely
X must be secure
X must exist as a library, so we can directly interface to it without forking
X must be lossless, of course
X must be FOSS
X should be patent free or must give a patent license that is wide enough.
X must be general-purpose, i.e. if X only works for a specific file format it won't work for Borg (chunk-, not file-, oriented architecture). In other words, X must be a bit-perfect lossless compressor.

To not make this ticket get unmanageable over time, developers will actively edit and also delete comments here after processing them.

ThomasWaldmann · 2016-09-23T22:55:02Z

Lepton JPEG compression

Lepton is currently not available as a library, see Lepton as a shared library dropbox/lepton#35

ThomasWaldmann · 2016-09-23T23:01:32Z

zstd compression

PATENTS? Patents? facebook/zstd#335
https://blog.fefe.de/?ts=a90c6783 (de)

ThomasWaldmann · 2016-09-23T23:06:19Z

brotli compression

jungle-boogie · 2016-10-14T15:49:29Z

I think zstd could be very compelling for something like borg. Admittedly, I don't full understand the patent issue around it, though. There's already a couple python packages bundling zstd. Those can't be used?

ThomasWaldmann · 2016-10-14T16:07:36Z

@jungle-boogie for distribution (in the debian / fedora / ... sense) it matters whether zstd is available as package there (we can't just pull it via pip and compile it). I agree that zstd looks interesting technically.

enkore · 2016-10-14T16:23:46Z

zstd indeed looks nice but FB doesn't seem to want to move at all regarding the patents issue. So far they refuse to tell even whether they actually hold any patents (this has been an issue with other projects from them as well, where the same LICENSE+PATENTS is used; however in some cases (iirc react) it has been seen that they have patent [applications]).

To me it looks a lot like their intention is to create a "mutually assured destruction" scenario for patents around their software.

However, I think it is quite clear that there won't be any widespread adoption (network protocols, file formats) of zstd unless this issue is clarified. Given that another entity in FB tries to push zstd as the standard compression of the future (tm) we'll just have to wait and see which part of FB prevails here.

So in summary: Inclusion in Borg doesn't really depend on us, it depends on what other players in the ecosystem and FB will do with it.

FabioPedretti · 2016-10-15T13:33:34Z

I respectful disagree with previous comment:

zstd has a patent grant which is not perfect for everyone, sure. But it is clear what it covers, what the user can and cannot do. (And BTW there are no known patent currently in it.) And a similar patent grant is also used by Google's libvpx, which is somewhat the defacto standard in open source/patent friendly video codec, used by many open source software. See here for some background: http://blog.webmproject.org/2010/06/changes-to-webm-open-source-license.html
borg itself (just an example), OTOH, has no patent grant or a suitable licence (e.g. GPLv3), so (past, current and future) borg developers may sue/threat all borg users at any time and without any reason (e.g. they could revocke users permission to use the software threathing with potential patents that may be present/valid or not and that only a judge could clear). (Still I am sure this won't happen, we know borg developers are fine, but there is no provision for this).

So zstd is not perfect but very clear about it: every user may decide if that grant is OK for him and use/don't use the software. Borg (as well as most open source software not using licences covering patents like GPLv3) just ignore any potential issue and their users just have to hope for the best.

inikep · 2016-12-23T11:00:28Z

Zstd is already included in most important Linux/*BSD distributions:
facebook/zstd#320

Yann Collet's (zstd author) answer regarding patents:

License and patents are orthogonal
The license is and remains BSD, no matter what, irrespective of any patent issue.
A BSD license doesn't cover patents.
The patent grant is an additional set of rights, on top of the BSD license. It protects more by being there than by not being present.
The PATENT file is generic and present within all Facebook Open-Source projects.
For more details, your concerns have also been forwarded to legal team ([email protected]).

ThomasWaldmann · 2016-12-23T23:37:26Z

moved from suggestion by @infectormp in #45:

alternative to zlib from Apple
LZFSE - https://www.infoq.com/news/2016/07/apple-lzfse-lossless-opensource

sjuxax · 2017-04-01T20:00:07Z

I haven't read through all of the tickets yet, and I'm sure this has already been answered somewhere, but it sounds like there are various challenges posed by the approach taken to compression in borg. Why is it not better to have borg do all of its work without compression and then call out to an external compressor?

This would be similar to tar's use-compress-program option. That has served tar well for many years, as people have been able to easily swap out for multithreaded and/or experimental compressors, and tar's compatibility with such tools is a large part of its longevity, as community standards have transitioned from uncompressed -> gz -> bz2 -> lzma (-> zstd/brotli?).

ThomasWaldmann · 2017-04-01T20:19:38Z

@sjuxax forking an external program is expensive, esp. if you have small data sizes.

borg compresses chunks (~2MB for big files, or as small as the file is for smaller files).
tar compresses the whole stream, so has lots of data and only forks once.

enkore · 2017-04-01T20:28:27Z

Added

X must be general-purpose, i.e. if X only works for a specific file format it won't work for Borg (chunk-, not file-, oriented architecture)

Which is a reason why e.g. things like dropbox/lepton do not work.

enkore · 2017-06-10T10:23:22Z

zstd is coming around nicely; schedule for inclusion in 1.2?

How do we add a compression algorithm? Mandatory feature flag when it's used?

ThomasWaldmann · 2017-06-10T12:26:37Z

@enkore I'ld put it into 1.3. We can do it earlier in case we get bored.

Yes, guess that a mandatory flag for read. And the manifest better does not get compressed (not at all or not with a new algorithm).

enkore · 2017-08-20T19:40:59Z

By way of facebook/zstd#801, zstd is now licensed under a standard BSD license.

ThomasWaldmann · 2017-08-21T04:07:30Z

Put it into 1.3, so we have it on the radar. Maybe we can also do it earlier, let's see.

rugk · 2017-09-17T16:59:08Z

BTW, FYI, zstd is now added to the Linux kernel 4.14 for btrfs.

ThomasWaldmann · 2017-12-02T21:22:39Z

About zstd: I am working on an updated PR #3411 based on @willyvmm's work in PR #3116, addressing some of my own feedback from back then.

ThomasWaldmann · 2017-12-15T03:11:07Z

zstd support is now in master branch (#3411) and a 1.1 backport is pending (#3442).

ThomasWaldmann · 2017-12-31T07:50:33Z

zstd support in borg was released with 1.1.4.

henfri · 2017-12-31T16:34:16Z

Thanks!
Borg is also already referenced at their homepage.
http://facebook.github.io/zstd/#other-languages

ThomasWaldmann · 2018-01-01T16:29:37Z

@henfri yeah, I also noticed that - it was there before we actually implemented that.

u1735067 · 2018-02-18T00:59:04Z

I hope it's the right place to post that: https://quixdb.github.io/squash-benchmark/ seems to have some indications, like Density or Pithy

ThomasWaldmann · 2018-02-19T12:55:10Z

Why do you think so?

lz4 is quite good for high speed, while zstd covers a wide range of compression ratios with good speed.

We don't need to add anything that is only marginally better.

wbolster · 2020-02-07T23:24:24Z

zstd also has an adaptive mode which will vary compression level (within min/max boundary) based on I/O throughput, which may be an interesting feature to expose in borg, e.g. --compression zstd,min=3,max=16. this syntax is borrowed from the zstd command line tool, which exposes the feature using the --adapt option. from the man page:

       --adapt[=min=#,max=#]
              zstd will dynamically adapt compression level to perceived I/O
              conditions. Compression level adaptation can be observed live by
              using command -v. Adaptation can be constrained between supplied
              min and max levels. The feature works when combined with
              multi-threading and --long mode. It does not work with
              --single-thread. It sets window size to 8 MB by default (can be
              changed manually, see wlog). Due to the chaotic nature of
              dynamic adaptation, compressed result is not reproducible. note
              : at the time of this writing, --adapt can remain stuck at low
              speed when combined with multiple worker threads (>=2).

ThomasWaldmann · 2020-02-08T00:05:24Z

Guess that is from zstd, the cli compression tool, not zstd, the library.

borg uses it as a library, so the called library function does not know anything about I/O conditions.

Gu1nness · 2020-11-04T09:12:30Z

What is exactly expected from this issue in terms of documentation?
Documenting the process to add a new compression algorithm?
Documenting the already existing algrotihms.
Both?

I'm ready to do this one

ThomasWaldmann · 2020-11-04T11:59:11Z

Guess it is enough to have this here on the issue tracker.

Gu1nness · 2020-11-07T18:20:41Z

@ThomasWaldmann I guess it could be in the development.rst file for people not looking for it in the issue tracker. But that's just my point of view.

ThomasWaldmann · 2020-11-07T18:31:50Z

If you like, add a pointer to this issue to the docs, but do not duplicate all the stuff from top post to the docs.

As this might need per-algorithm discussion, that is better done here and not possible in the docs.

Add some documentation for new compression algorithm, see #1633 [1.1]

alexandervlpl · 2023-06-18T08:05:07Z

I think JPEG XL meets all those requirements perfectly, it's the next official JPEG standard. In lossless transcode mode it functions as a bit-perfect lossless compressor for ordinary JPEGs, reducing their size by ~20%.

Seems like a much better option than Lepton (#1632).

Daasin · 2024-02-11T14:52:59Z

I think JPEG XL meets all those requirements perfectly, it's the next official JPEG standard. In lossless transcode mode it functions as a bit-perfect lossless compressor for ordinary JPEGs, reducing their size by ~20%.

Seems like a much better option than Lepton (#1632).

Agree on JPEG-XL being an interesting consideration

ThomasWaldmann · 2024-02-11T17:27:44Z

Cython binding: https://github.com/olokelo/jxlpy/

I had a quick look, but there doesn't seem to be a simple compress/decompress api for content of jpeg files. Is it even possible?

What we want for borg is bit-identical reconstruction of file content.

alexandervlpl · 2024-02-12T07:03:24Z

@ThomasWaldmann it's absolutely possible, they call it "lossless transcode" and it decodes back to the original file byte for byte (identical checksums). According to the jxlpy feature list, there's no support for it in the Python bindings yet. A chance here to contribute to two projects at once?

For my backups I run cjxl --lossless_jpeg=1 before borg, saved me ~50GB of bandwidth and storage space so far. JXL is fully mature and ISO standardized since 2022 with decoder conformance (meaning any future versions must be able to decode my backups years from now). It doesn't get better than this for archiving.

Shall we open a separate issue for JXL? I'd be happy to at least test it on my side.

ThomasWaldmann · 2024-02-12T11:06:25Z

@alexandervlpl Let's discuss in #8092 first.

ThomasWaldmann added the documentation label Sep 23, 2016

This was referenced Sep 23, 2016

Selectively compress JPEG files using Lepton #1632

Closed

support for zstd / zstandard compression #1564

Closed

This was referenced Sep 23, 2016

feature request: support for lzip compression #827

Closed

support external compression/decompression programs #720

Closed

enkore mentioned this issue Feb 18, 2017

Any plan to improve hash and compression performance ? #2176

Closed

ThomasWaldmann added this to the 1.3 milestone Aug 21, 2017

ThomasWaldmann removed this from the lithium milestone Jun 12, 2018

ThomasWaldmann mentioned this issue Oct 4, 2020

Add Snappy compression #5378

Closed

3 tasks

meineerde mentioned this issue Nov 4, 2020

Adding brotli as a new compression algorithm #5486

Closed

Gu1nness mentioned this issue Nov 13, 2020

Add some documentation for new compression algorithm #5505

Merged

ThomasWaldmann added a commit that referenced this issue Dec 2, 2020

Merge pull request #5525 from Gu1nness/5505-1.1-maint

82304a8

Add some documentation for new compression algorithm, see #1633 [1.1]

interesting compression algorithms + policy #1633

interesting compression algorithms + policy #1633

Comments

ThomasWaldmann commented Sep 23, 2016 • edited by enkore

ThomasWaldmann commented Sep 23, 2016 • edited

ThomasWaldmann commented Sep 23, 2016 • edited

ThomasWaldmann commented Sep 23, 2016

jungle-boogie commented Oct 14, 2016

ThomasWaldmann commented Oct 14, 2016

enkore commented Oct 14, 2016 • edited

FabioPedretti commented Oct 15, 2016

inikep commented Dec 23, 2016

ThomasWaldmann commented Dec 23, 2016

sjuxax commented Apr 1, 2017

ThomasWaldmann commented Apr 1, 2017 • edited

enkore commented Apr 1, 2017 • edited

enkore commented Jun 10, 2017 • edited

ThomasWaldmann commented Jun 10, 2017

enkore commented Aug 20, 2017

ThomasWaldmann commented Aug 21, 2017

rugk commented Sep 17, 2017

ThomasWaldmann commented Dec 2, 2017 • edited

ThomasWaldmann commented Dec 15, 2017

ThomasWaldmann commented Dec 31, 2017

henfri commented Dec 31, 2017

ThomasWaldmann commented Jan 1, 2018

u1735067 commented Feb 18, 2018

ThomasWaldmann commented Feb 19, 2018

wbolster commented Feb 7, 2020 • edited

ThomasWaldmann commented Feb 8, 2020

Gu1nness commented Nov 4, 2020

ThomasWaldmann commented Nov 4, 2020

Gu1nness commented Nov 7, 2020

ThomasWaldmann commented Nov 7, 2020

alexandervlpl commented Jun 18, 2023

Daasin commented Feb 11, 2024

ThomasWaldmann commented Feb 11, 2024

alexandervlpl commented Feb 12, 2024 • edited

ThomasWaldmann commented Feb 12, 2024

ThomasWaldmann commented Sep 23, 2016 •

edited by enkore

ThomasWaldmann commented Sep 23, 2016 •

edited

ThomasWaldmann commented Sep 23, 2016 •

edited

enkore commented Oct 14, 2016 •

edited

ThomasWaldmann commented Apr 1, 2017 •

edited

enkore commented Apr 1, 2017 •

edited

enkore commented Jun 10, 2017 •

edited

ThomasWaldmann commented Dec 2, 2017 •

edited

wbolster commented Feb 7, 2020 •

edited

alexandervlpl commented Feb 12, 2024 •

edited