Open
Description
Would it be in the scope for this project to add stream deduplication to some formats?
It would drastically reduce the filesizes of the archives. Basically, marking blocks which are the same as such, and only including a reference to the first of them in the header. Not all formats may be compatible tho
This would come as an step before compression.
So it could be implemented as a format option, basically "tar+dedup".compressionFormat
More about the technique can be read at https://github.com/klauspost/dedup, which is also a stream-deduplication library.
There's an article explaining everything in great detail here too: https://blog.klauspost.com/fast-stream-deduplication-in-go/