zstd:chunked metadata ambiguity

Filing separately, earlier discussion around https://github.com/containers/storage/issues/1888#issuecomment-2145636434 :

zstd:chunked layers, when pulling, contain metadata in _three_ places:
- The ordinary uncompressed tar format
- TOC metadata
- tar-split metadata

When pulling, we ignore the uncompressed tar, build files from TOC metadata, and record tar-split 

When pushing, we ignore the metadata of individual files, and use the tar-split metadata.

The net outcome is that a user can “pull; inspect; push”, and the pushed metadata will be different from what the user saw.

I think this_must_ be addressed.

---

Vaguely, I think that could either happen by having the tar-split “drive” the chunked pull, using the TOC only to look up data; or by overwriting the TOC metadata by the tar-split data.

The latter seems a bit easier because the “push” compression code already has a tar → TOC conversion code, so we would “only” need to read through tar-split; the former is _conceptually_ nicer because, hypothetically, we could eventually only have a _single_ “apply tar header → filesystem metadata” code for all of c/storage, instead of now having “tar → metadata” for ordinary overlay, “TOC → metadata” for chunked, and things like https://github.com/containers/storage/pull/1653#discussion_r1256423401 — but it would probably be more disruptive and not practical within current short time limits.

Cc: @giuseppe — I’ll be looking at this over the next few days, but I’d very much appreciate any insight, advice, or help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

zstd:chunked metadata ambiguity #2014

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

zstd:chunked metadata ambiguity #2014

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions