chunked: store compressed digest if validated #2001

giuseppe · 2024-07-05T12:10:14Z

if the compressed digest was validated, as it happens when 'pull_options = {convert_images = "true"}' is set, then store it as well so that reusing the blob by its compressed digest works.

Previously, when an image converted to zstd:chunked was pulled a second time, it would not be recognized by its compressed digest, resulting in the need to re-pull the image again.

openshift-ci · 2024-07-05T12:10:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [giuseppe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mtrmac

I think this works for simple cases, but it is not optimal in general: A single Layer can have multiple compressed representations, and this only stores one.

For ordinary uncompressed layers, that is handled by putBlobToPendingFile (computing both digests and) calling RecordDigestUncompressedPair; that can fully record a M-to-1 mapping. And that mapping is then consulted in tryReusingBlobAsPending.

So, it seems to me that PutBlobPartial should call RecordDigestUncompressedPair if the uncompressed digest is available. It’s already making an assumption that the compressed digest has been validated … though having pkg/chunked explicitly return the value in DriverWithDifferOutput, as this PR does, and explicitly validating that CompressedDigest == blobDigest in TryReusingBlobAsPending, would be more obviously correct.

mtrmac · 2024-07-06T14:02:24Z

layers.go

@@ -2529,6 +2529,7 @@ func (r *layerStore) applyDiffFromStagingDirectory(id string, diffOutput *driver
 	layer.GIDs = diffOutput.GIDs
 	updateDigestMap(&r.byuncompressedsum, layer.UncompressedDigest, diffOutput.UncompressedDigest, layer.ID)
 	layer.UncompressedDigest = diffOutput.UncompressedDigest
+	layer.CompressedDigest = diffOutput.CompressedDigest


This might overwrite a valid CompressedDigest value (hypothetically, from a template layer) with ""

If this field is modified, a corresponding updateDigestMap call needs to happen as well.

I think the other parts of this PR are clearly valuable; this one doesn’t really hurt but seems unnecessary for the purpose. OTOH handling the DriverWithDifferOutput fields consistently is somewhat valuable.

thanks, I'll fix it and call updateDigestMap

giuseppe · 2024-07-08T09:59:56Z

I think this works for simple cases, but it is not optimal in general: A single Layer can have multiple compressed representations, and this only stores one.

The Layer object already knows about the CompressedDigest, so I think we can use for the case where we convert an image and validate the compressed digest.

when we pull a layer without any conversion happening, we store the following information in the overlay-layers/layers.json file:

[{"id":"94e5f06ff8e3d4441dc3cd8b090ff38dc911bfa8ebdb0dc28395bc98f82f983f","created":"2024-07-08T09:28:18.17830826Z","compressed-diff-digest":"sha256:ec99f8b99825a742d50fb3ce173d291378a46ab54b8ef7dd75e5654e2a296e99","compressed-size":3623844,"diff-digest":"sha256:94e5f06ff8e3d4441dc3cd8b090ff38dc911bfa8ebdb0dc28395bc98f82f983f","diff-size":8082944,"compression":2,"uidset":[0],"gidset":[0,42]}]

previously, without the current change we weren't storing the compressed-diff-digest:

[{"id":"d51af96cf93e225825efd484ea457f867cb2b967f7415b9a3b7e65a2f803838a","created":"2024-07-08T09:49:07.867152639Z","diff-digest":"sha256:d51af96cf93e225825efd484ea457f867cb2b967f7415b9a3b7e65a2f803838a","diff-size":4495360,"uidset":[65534,1,8,0],"gidset":[8,0,65534,1],"big-data-names":["zstd-chunked-manifest","zstd-chunked-layer-data"]}]

mtrmac · 2024-07-08T17:40:45Z

I think this works for simple cases, but it is not optimal in general: A single Layer can have multiple compressed representations, and this only stores one.

The Layer object already knows about the CompressedDigest

It knows one per deduplicated layer. That loses information if the same layer is compressed multiple ways. (And, also, the information is lost when the layer is deleted.)

The way this is expected to work is that is primarily recorded in the c/image BlobInfoCache.

Signed-off-by: Miloslav Trmač <[email protected]>

mtrmac · 2024-07-08T17:54:55Z

@giuseppe containers/image#2478 is what I think code based on this PR can look like; that should work correctly even without the layerStoreapplyDiffFromStagingDirectory part of this PR.

Alternatively, WIP containers/image#2321 among various other changes, also updates BIC, without needing a new c/storage API (but hard-coding the same assumption about what c/storage validates).

giuseppe · 2024-07-08T18:37:23Z

It knows one per deduplicated layer. That loses information if the same layer is compressed multiple ways. (And, also, the information is lost when the layer is deleted.)

The way this is expected to work is that is primarily recorded in the c/image BlobInfoCache.

I am not opposing the change in c/image. I think we need that as well to deal with multiple compressed digests for the same uncompressed diff. I agree your change in containers/image#2478 is correct and we should have it. I still think though we need also the current change in c/storage so it can be used by APIs like LayersByCompressedDigest

mtrmac · 2024-07-08T18:46:23Z

LayersByCompressedDigest is fundamentally lossy … but, sure, as long as we have the API, it’s valuable to try to keep it as good as possible.

giuseppe · 2024-07-09T07:35:34Z

LayersByCompressedDigest is fundamentally lossy … but, sure, as long as we have the API, it’s valuable to try to keep it as good as possible.

so can we merge the current PR? :-)

mtrmac

I’m sorry, I screwed something up and this remained “pending” .

LGTM otherwise.

layers.go

if the compressed digest was validated, as it happens when 'pull_options = {convert_images = "true"}' is set, then store it as well so that reusing the blob by its compressed digest works. Previously, when an image converted to zstd:chunked was pulled a second time, it would not be recognized by its compressed digest, resulting in the need to re-pull the image again. Signed-off-by: Giuseppe Scrivano <[email protected]>

rhatdan · 2024-07-10T09:11:05Z

/lgtm

mtrmac · 2024-07-10T14:46:13Z

LGTM, for the record.

Signed-off-by: Miloslav Trmač <[email protected]>

openshift-ci bot added the approved label Jul 5, 2024

mtrmac reviewed Jul 6, 2024

View reviewed changes

giuseppe force-pushed the chunked-store-compressed-digest branch from 7b9d43f to 4ba7ad8 Compare July 8, 2024 10:00

mtrmac added a commit to mtrmac/image that referenced this pull request Jul 8, 2024

DO NOT MERGE: Use c/storage from containers/storage#2001

dd960c8

Signed-off-by: Miloslav Trmač <[email protected]>

mtrmac mentioned this pull request Jul 8, 2024

Allow matching of compressed blobs converted on the fly to zstd:chunked containers/image#2478

Merged

mtrmac reviewed Jul 9, 2024

View reviewed changes

layers.go Outdated Show resolved Hide resolved

giuseppe force-pushed the chunked-store-compressed-digest branch from 4ba7ad8 to 28cba30 Compare July 10, 2024 07:42

openshift-ci bot assigned rhatdan Jul 10, 2024

openshift-ci bot added the lgtm label Jul 10, 2024

openshift-merge-bot bot merged commit 11c8426 into containers:main Jul 10, 2024
18 checks passed

mtrmac added a commit to mtrmac/image that referenced this pull request Jul 10, 2024

Update c/storage after containers/storage#2001

8ca60c5

Signed-off-by: Miloslav Trmač <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chunked: store compressed digest if validated #2001

chunked: store compressed digest if validated #2001

giuseppe commented Jul 5, 2024

openshift-ci bot commented Jul 5, 2024

mtrmac left a comment

mtrmac Jul 6, 2024

giuseppe Jul 8, 2024

giuseppe commented Jul 8, 2024

mtrmac commented Jul 8, 2024

mtrmac commented Jul 8, 2024

giuseppe commented Jul 8, 2024

mtrmac commented Jul 8, 2024

giuseppe commented Jul 9, 2024

mtrmac left a comment

rhatdan commented Jul 10, 2024

mtrmac commented Jul 10, 2024

chunked: store compressed digest if validated #2001

chunked: store compressed digest if validated #2001

Conversation

giuseppe commented Jul 5, 2024

openshift-ci bot commented Jul 5, 2024

mtrmac left a comment

Choose a reason for hiding this comment

mtrmac Jul 6, 2024

Choose a reason for hiding this comment

giuseppe Jul 8, 2024

Choose a reason for hiding this comment

giuseppe commented Jul 8, 2024

mtrmac commented Jul 8, 2024

mtrmac commented Jul 8, 2024

giuseppe commented Jul 8, 2024

mtrmac commented Jul 8, 2024

giuseppe commented Jul 9, 2024

mtrmac left a comment

Choose a reason for hiding this comment

rhatdan commented Jul 10, 2024

mtrmac commented Jul 10, 2024