Description
Filing separately, earlier discussion around #1888 (comment) :
zstd:chunked layers, when pulling, contain metadata in three places:
- The ordinary uncompressed tar format
- TOC metadata
- tar-split metadata
When pulling, we ignore the uncompressed tar, build files from TOC metadata, and record tar-split
When pushing, we ignore the metadata of individual files, and use the tar-split metadata.
The net outcome is that a user can “pull; inspect; push”, and the pushed metadata will be different from what the user saw.
I think this_must_ be addressed.
Vaguely, I think that could either happen by having the tar-split “drive” the chunked pull, using the TOC only to look up data; or by overwriting the TOC metadata by the tar-split data.
The latter seems a bit easier because the “push” compression code already has a tar → TOC conversion code, so we would “only” need to read through tar-split; the former is conceptually nicer because, hypothetically, we could eventually only have a single “apply tar header → filesystem metadata” code for all of c/storage, instead of now having “tar → metadata” for ordinary overlay, “TOC → metadata” for chunked, and things like #1653 (comment) — but it would probably be more disruptive and not practical within current short time limits.
Cc: @giuseppe — I’ll be looking at this over the next few days, but I’d very much appreciate any insight, advice, or help.