-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(handler): add geom_uzip handler #1143
base: main
Are you sure you want to change the base?
Conversation
tests/integration/compression/uzip/__output__/myfs.img.uzip_extract/myfs.img
Outdated
Show resolved
Hide resolved
ea64c85
to
b095981
Compare
@vlaci what would be the easiest way to add pyzstd to unblob dependencies in Nix here ? It's not yet in upstream at https://github.com/NixOS/nixpkgs/blob/0fa90d642277de2c67e93204cc5870aba8af5878/pkgs/by-name/un/unblob/package.nix#L59 so we need a way to define it in this branch in the meantime. |
@rxpha3l I'm using this fix locally, but not sure if it's idiomatic Nix diff --git a/overlay.nix b/overlay.nix
index 9c5051e..265cd79 100644
--- a/overlay.nix
+++ b/overlay.nix
@@ -29,6 +29,8 @@ final: prev:
];
};
+ dependencies = (super.dependencies or []) ++ [ prev.python3.pkgs.pyzstd ];
+
# remove this when packaging changes are upstreamed
cargoDeps = final.rustPlatform.importCargoLock {
lockFile = ./Cargo.lock; |
519c7e2
to
34c6696
Compare
Geom_uzip is a FreeBSD feature for creating compressed disk images (usually containing UFS). The compression is done in blocks, and the resulting .uzip file can be mounted via the GEOM framework on FreeBSD. The mkuzip header includes a table with block counts and sizes. The header declares the block size (size of decompressed blocks) and total number of blocks. Block size must be a multiple of 512 and defaults to 16384 in mkuzip. It has the following structure: > Magic, which is a shebang & compression identifier stored on 16 bytes. > Format, which is a shell command that provides some general information. > Block size, stored on 4 bytes. > Block count, stored on 4 bytes. > Table of content (TOC), which depends on the file lentgh. The TOC is a list of uint64_t offsets into the file for each block. To determine the length of a given block, read the next TOC entry and subtract the current offset from the next offset (this is why there is an extra TOC entry at the end). Each block is compressed using zlib. A standard zlib decompressor will decode them to a block of size block_size. Unblob parses the TOC to determine end & start offset of the compressed file. It detects the compression method (zlib, lzma or zstd). Finally the chunks are decompressed to revocer the inital file. Empty chunks are ignored, which is why the decompressed file with unlbob can be a little bit lighter than the original one. [Sources] https://github.com/mikeryan/unuzip https://www.baeldung.com/linux/filesystem-in-a-file https://docs.python.org/3/library/zlib.html https://github.com/freebsd/freebsd-src/blob/master/sys/geom/uzip/g_uzip.c https://parchive.sourceforge.net/docs/specifications/parity-volume-spec/article-spec.html https://www.mail-archive.com/[email protected]/msg34955.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the comments and rebase so there is no fixup commit
import re | ||
import zlib | ||
from lzma import LZMADecompressor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i talked with Krisztian, and maybe it is better just to import the decompression object instead of the entire library. it even starts with LZMA
so it is easy to understand from which library it comes from. should i change it ?
@@ -41,8 +41,8 @@ | |||
|
|||
DECOMPRESS_METHOD = { | |||
ZLIB_COMPRESSION.encode(): zlib.decompressobj, | |||
LZMA_COMPRESSION.encode(): lzma.LZMADecompressor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change ?
geom_uzip is a FreeBSD feature for creating compressed disk images (usually containing UFS). The compression is done in blocks, and the resulting .uzip file can be mounted via the GEOM framework on FreeBSD.
The mkuzip header includes a table with block counts and sizes. The header declares the block size (size of decompressed blocks) and total number of blocks. Block size must be a multiple of 512 and defaults to 16384 in mkuzip.
It has the following structure:
Unblob parses the TOC to determine end & start offset of the uzip file. It will find the compressed blocks, decompress them using zlib and parses them together to recover the decompressed file. Empty chunks are ignored, which is why the decompressed file with unlbob can be a little bit lighter than the original one.
[Sources]
https://github.com/freebsd/freebsd-src/blob/master/sys/geom/uzip/g_uzip.c