-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chunked: store compressed digest if validated #2001
chunked: store compressed digest if validated #2001
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works for simple cases, but it is not optimal in general: A single Layer
can have multiple compressed representations, and this only stores one.
For ordinary uncompressed layers, that is handled by putBlobToPendingFile
(computing both digests and) calling RecordDigestUncompressedPair
; that can fully record a M-to-1 mapping. And that mapping is then consulted in tryReusingBlobAsPending
.
So, it seems to me that PutBlobPartial
should call RecordDigestUncompressedPair
if the uncompressed digest is available. It’s already making an assumption that the compressed digest has been validated … though having pkg/chunked
explicitly return the value in DriverWithDifferOutput
, as this PR does, and explicitly validating that CompressedDigest == blobDigest
in TryReusingBlobAsPending
, would be more obviously correct.
@@ -2529,6 +2529,7 @@ func (r *layerStore) applyDiffFromStagingDirectory(id string, diffOutput *driver | |||
layer.GIDs = diffOutput.GIDs | |||
updateDigestMap(&r.byuncompressedsum, layer.UncompressedDigest, diffOutput.UncompressedDigest, layer.ID) | |||
layer.UncompressedDigest = diffOutput.UncompressedDigest | |||
layer.CompressedDigest = diffOutput.CompressedDigest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This might overwrite a valid
CompressedDigest
value (hypothetically, from a template layer) with""
- If this field is modified, a corresponding
updateDigestMap
call needs to happen as well.
I think the other parts of this PR are clearly valuable; this one doesn’t really hurt but seems unnecessary for the purpose. OTOH handling the DriverWithDifferOutput
fields consistently is somewhat valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I'll fix it and call updateDigestMap
The Layer object already knows about the when we pull a layer without any conversion happening, we store the following information in the [{"id":"94e5f06ff8e3d4441dc3cd8b090ff38dc911bfa8ebdb0dc28395bc98f82f983f","created":"2024-07-08T09:28:18.17830826Z","compressed-diff-digest":"sha256:ec99f8b99825a742d50fb3ce173d291378a46ab54b8ef7dd75e5654e2a296e99","compressed-size":3623844,"diff-digest":"sha256:94e5f06ff8e3d4441dc3cd8b090ff38dc911bfa8ebdb0dc28395bc98f82f983f","diff-size":8082944,"compression":2,"uidset":[0],"gidset":[0,42]}] previously, without the current change we weren't storing the [{"id":"d51af96cf93e225825efd484ea457f867cb2b967f7415b9a3b7e65a2f803838a","created":"2024-07-08T09:49:07.867152639Z","diff-digest":"sha256:d51af96cf93e225825efd484ea457f867cb2b967f7415b9a3b7e65a2f803838a","diff-size":4495360,"uidset":[65534,1,8,0],"gidset":[8,0,65534,1],"big-data-names":["zstd-chunked-manifest","zstd-chunked-layer-data"]}] |
7b9d43f
to
4ba7ad8
Compare
It knows one per deduplicated layer. That loses information if the same layer is compressed multiple ways. (And, also, the information is lost when the layer is deleted.) The way this is expected to work is that is primarily recorded in the c/image BlobInfoCache. |
Signed-off-by: Miloslav Trmač <[email protected]>
@giuseppe containers/image#2478 is what I think code based on this PR can look like; that should work correctly even without the Alternatively, WIP containers/image#2321 among various other changes, also updates BIC, without needing a new c/storage API (but hard-coding the same assumption about what c/storage validates). |
I am not opposing the change in c/image. I think we need that as well to deal with multiple compressed digests for the same uncompressed diff. I agree your change in containers/image#2478 is correct and we should have it. I still think though we need also the current change in c/storage so it can be used by APIs like |
|
so can we merge the current PR? :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m sorry, I screwed something up and this remained “pending” .
LGTM otherwise.
if the compressed digest was validated, as it happens when 'pull_options = {convert_images = "true"}' is set, then store it as well so that reusing the blob by its compressed digest works. Previously, when an image converted to zstd:chunked was pulled a second time, it would not be recognized by its compressed digest, resulting in the need to re-pull the image again. Signed-off-by: Giuseppe Scrivano <[email protected]>
4ba7ad8
to
28cba30
Compare
/lgtm |
LGTM, for the record. |
Signed-off-by: Miloslav Trmač <[email protected]>
if the compressed digest was validated, as it happens when 'pull_options = {convert_images = "true"}' is set, then store it as well so that reusing the blob by its compressed digest works.
Previously, when an image converted to zstd:chunked was pulled a second time, it would not be recognized by its compressed digest, resulting in the need to re-pull the image again.