Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-5.32] Zstd backports #2508

Merged
merged 51 commits into from
Aug 9, 2024
Merged

Conversation

mtrmac
Copy link
Collaborator

@mtrmac mtrmac commented Aug 7, 2024

This is #2321 + #2503 + #2487 , backported to a newly-created release-5.32 branch, along with a release bump to 5.32.1.

Alternatively, we could just tag main as a 5.32.1 without backporting: see https://github.com/containers/image/compare/main..mtrmac:5.32-zstd , the difference is a set of typo fixes and dependency updates. The dependency updates are probably unwanted, I’m not sure.

Cc: @TomSweeneyRedHat

TomSweeneyRedHat and others added 30 commits July 25, 2024 18:21
As the title says.  Bumping the main branch back to a dev version.

Signed-off-by: tomsweeneyredhat <[email protected]>
…/v5.32.0

Bump c/storage to v1.55.0, c/image to v5.32.0
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…m-vbauerster-mpb-v8-8.x

fix(deps): update module github.com/vbauerster/mpb/v8 to v8.7.5
We are already calling m.LayerInfos() anyway, so there is ~no
extra cost. And using LayerInfos means we don't need to worry
about reversing the order of layers, and we will have access
to the layer index, allowing us to acccess the indexTo* fields
in the future.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
- Don't claim that we only use compressed digests.
- Explicitly document that we assume TOC digests to be unambiguous
- Actually define the term "DiffID".
- Be more precise in computeID about the criteria being layer identity,
  not where we pull the layer from.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
Some errors are severe enough that just logging and continuing is
not really worthwhile.

Signed-off-by: Miloslav Trmač <[email protected]>
…tyDataLocked

Currrently we "only" have indexToTOCDigest and blobDiffIDs, but we will make this
more complex.

Centralizing the consumption of these fields into trustedLayerIdentityDataLocked
ensure that all consumers interpret the data exactly consistently (and it also
allows us to use a single "trusted" variable instead of 2/3 individual ones).

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
The new code is not called, so it should not change behavior
(apart from extending the BoltDB/SQLite schema).

Signed-off-by: Miloslav Trmač <[email protected]>
…storage by DiffID

If we can, prefer identifying layers by DiffID, because multiple TOCs can map to the
same DiffID; and because it maximizes reuse with non-TOC layers.

For now, the new situation is unreachable.

Signed-off-by: Miloslav Trmač <[email protected]>
We will add one more instance of this, so share the code.

Should not change behavior (it does remove one unreachable code path).

Signed-off-by: Miloslav Trmač <[email protected]>
… is known

- Multiple TOC values might correspond to a single DiffID (e.g. if different
  compression levels are used); try to share them all, identified by DiffID
  (so that we also reuse with non-TOC pulls).
  - LayersByTOCDigest only uses a single TOC digest per layer; BlobInfoCache
    allows multiple matches, matches layers which have been since deleted,
    and potentially matches TOC digests which we have created by pushing
    but haven't pulled yet.
- On reuse, we can now use DiffID-based layer identities even if the reuse
  was TOC~driven.

Signed-off-by: Miloslav Trmač <[email protected]>
…ayers

- Rely on it instead of triggering the "untrusted DiffID" logic
- Also propagate it to storage

Signed-off-by: Miloslav Trmač <[email protected]>
Record (TOC digest → DiffID) mapping in BlobInfoCache
This is to prevent codespell from complaining about pathc -> patch.

Signed-off-by: Kir Kolyshkin <[email protected]>
Use ok instead of isT and isV, and limit the variables scope.

Signed-off-by: Kir Kolyshkin <[email protected]>
s/doesnt/does-not/

Signed-off-by: Kir Kolyshkin <[email protected]>
This just changes fo to f, without disrupting the test.

Signed-off-by: Kir Kolyshkin <[email protected]>
This fixes a comment in the test code ("expection", "out", and
punctuation) to the best of my knowledge.

Fixes: c84a3fa ("copy/multiple_test: multiple copy requests of same compression")
Signed-off-by: Kir Kolyshkin <[email protected]>
Found by codespell 2.3.0:

	Error: ./copy/single.go:217: resuing ==> reusing, resuming, rescuing

Signed-off-by: Kir Kolyshkin <[email protected]>
Current .codespellrc contains too many exclusions, and as a result it
misses some legitimate typos.

A few previous commits silenced or fixed some of the codespell warnings
in preparation for this one, which minimizes exclusions, and fixes the
actual remaining typos.

The fixes here are the result of codespell -w.

./copy/sign_test.go:119: overidden ==> overridden
./copy/encryption.go:51: pratice ==> practice
./copy/multiple_test.go:80: crated ==> created
./copy/progress_bars.go:124: progres ==> progress

Signed-off-by: Kir Kolyshkin <[email protected]>
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…g-x-oauth2-0.x

fix(deps): update module golang.org/x/oauth2 to v0.22.0
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…g-x-sync-0.x

fix(deps): update module golang.org/x/sync to v0.8.0
manifest: LayerInfos: preallocate the slice
mtrmac and others added 15 commits August 5, 2024 20:47
We will add more logic to the default case, so sharing the
CandidateCompressionMatchesReuseConditions call is not going to be
as easy. Split the two code paths.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
We will want to record more than a single alghoritm name. For now,
just introduce the structure and modify users, we'll add the new fields
later.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
… blobs

... because we don't trust the TOC data, if any.

This allows us to remove the zstd:chunked hack; we, at least,
now record those blobs as zstd.

Signed-off-by: Miloslav Trmač <[email protected]>
zstd:chunked refactoring for early review
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…g-x-crypto-0.x

fix(deps): update module golang.org/x/crypto to v0.26.0
If we don't know an uncompressed digest, don't try using "".

Signed-off-by: Miloslav Trmač <[email protected]>
Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
The cache implementations are recording both the base and specific compression variant;
CandidateLocations2 all call CandidateTemplateWithCompression to choose the
appropriate variants to return based on CandidateLocations2Options.

This way, neither the BIC implementations nor the transports are not responsible for
converting zstd:chunked entries to zstd entries if the user wants the latter.

Signed-off-by: Miloslav Trmač <[email protected]>
Introduce distinct uploadedCompressorBaseVariantName and
uploadedCompressorSpecificVariantName fields; that way we now never
call RecordDigestCompressorData with inconsistent zstd / zstd:chunked in one field,
so we can always record data when we see, or create, a zstd:chunked layer,
removing the current hack.

Signed-off-by: Miloslav Trmač <[email protected]>
- Add a CompressionAnnotations field
- Allow turning a known-zstd blob into a zstd:chunked one if we
  know the right annotations

This just adds the fields, nothing sets them yet, should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
- Return the required annotations, if we have them
- If we have a zstd blob and the BIC contains the annotations,
  we don't check for the blob's presence initially. In that case,
  don't skip it if we find the BIC annotations.

Signed-off-by: Miloslav Trmač <[email protected]>
... instead of only treating it as zstd.

Signed-off-by: Miloslav Trmač <[email protected]>
Record zstd:chunked format, and annotations, in BlobInfoCache
Signed-off-by: Miloslav Trmač <[email protected]>
@mheon
Copy link
Member

mheon commented Aug 7, 2024

LGTM

@TomSweeneyRedHat
Copy link
Member

This LGTM

@mtrmac would it be easily possible to also backport the typo and dependency fixes that you pointed at in the description?

@mtrmac
Copy link
Collaborator Author

mtrmac commented Aug 7, 2024

@TomSweeneyRedHat In that case, I’m tempted to tag the release from main, and branch only afterwards; and/or branch from the 5.32.0 point, but merge the main branch into the release branch, to make the equivalence directly visible in git. WDYT?

@mheon
Copy link
Member

mheon commented Aug 7, 2024

That seems fine as it seems like we want... effectively everything? In main right now to be in the tag.

@TomSweeneyRedHat
Copy link
Member

@mtrmac I'm fine with whatever is easiest and that you feel comfortable with. I'd like what's in main now to be in v5.32.0 or v5.32.1, in the release-5.32 branch when you're done. Or, I'm also happy if you just want to cut v5.33.0 based from main, and we'll have a very short-lived 5.32 release.

@mtrmac
Copy link
Collaborator Author

mtrmac commented Aug 8, 2024

Updated: now it is just updates to version.go and a merge of the main branch into release-5.32.

Or, I'm also happy if you just want to cut v5.33.0 based from main, and we'll have a very short-lived 5.32 release.

There are no new APIs, so let’s pretend we decide version numbers based on semantic versioning.

// encryption. (That can be determined from the unencrypted config anyway, but, still...)
//
// Ideally this should query a well-defined property of the compression algorithm (and $somehow determine the right fallback) instead of
// hard-coding zstd:chunked / zstd.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything here that we should let the end user know in Podman/Bulidah/Skopeo? Perhaps a warning in the compression options that zstd:chunked can't be encrypted? No need to change for this PR, just a point to consider for later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Skopeo, the user would have to ask for zstd:chunked on the CLI.

For Buildah/Podman, the default format is in containers.conf. For now I have filed containers/common#2117, and I’ll clean that up at some point. Thanks for the suggestion!

@TomSweeneyRedHat
Copy link
Member

LGTM

1 similar comment
@rhatdan
Copy link
Member

rhatdan commented Aug 9, 2024

LGTM

@rhatdan rhatdan merged commit 6a17c28 into containers:release-5.32 Aug 9, 2024
9 checks passed
@mtrmac mtrmac deleted the 5.32-zstd branch August 9, 2024 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants