Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: an OCI-Referrers header on manifest pull #454

Open
mtrmac opened this issue Jul 28, 2023 · 7 comments
Open

Proposal: an OCI-Referrers header on manifest pull #454

mtrmac opened this issue Jul 28, 2023 · 7 comments
Milestone

Comments

@mtrmac
Copy link

mtrmac commented Jul 28, 2023

As long as referrers are fairly rarely used, when pulling an image, determining if there are any referrers requires an extra round-trip (if the registry is known to support the referrers API) or two (if the registry does not support the API and the referrers tag schema needs to be used).

Assuming there are registries where the implementation makes it cheap enough (I have no idea if that’s the case), it seems potentially valuable to allow a manifest pull request to include a header indicating referrers presence:

OCI-Referrers: present|absent.

If this header is present:

  • The registry MUST support the Listing Referrers endpoint
  • The present/absent value indicates whether there, at the time of forming the response, was at least one referrer.

A registry where determining the existence of referrers is costly could choose not to include the header; the client would then need to make an explicit “Listing Referrers” request.

I’m not sure about specifying the present value, I don’t think it helps clients.


Alternatively, an OCI-Referrers-Artifact-Types header listing the artifact types of all included referrers could eliminate even more roundtrips (for clients which e.g. only care about signatures and not SBOMs), at the cost of possibly being even costlier for registry to obtain.

It might even make sense to specify both.


If referrers use ever becomes very widespread, these headers would just add overhead, because clients would almost always want to list the referrers. In that case, registries could choose to stop including these headers.


I apologize if this was already discussed; I couldn’t find anything searching issues in this repo.

@sudo-bmitch
Copy link
Contributor

For me, an OCI-Referrers: absent header would be a great performance improvement, and even more so as the use of the subject field grows. When doing a deep copy of an image and all of it's referrers, those referrers may themselves have referrers (e.g. an SBOM could have a signature). So doing the math for a multiplatform image:

  • 1 index
  • 7 platform specific manifests
  • 6 artifacts per manifest (e.g. 2 sboms, signature, attestation, license, and source code for the license)

Doing that math on this example, there are 42 artifacts and 1 index that don't have referrers themselves, so this would eliminate 43 API calls in this example. That can easily grow as more images are pushed per index (e.g. risc-v, wasm, perhaps zstd variants on images) and metadata gets pushed per image (e.g. there's VEX reports, more SBOM formats, reproducibility data, multiple signatures from different parties, etc).

The OCI-Referrers: present header is less useful in my scenarios, if I want the metadata I'm going to run the API call first. The only way it might help is if this was used to know to fall back to the tag without querying referrers (because the header is missing), but my own plan is to cache registry capabilities for a few minutes to avoid calling an unavailable referrers API.

For the OCI-Referrers-Artifact-Types listing, it would be interesting to see a comparison of registry added overhead to implement vs reduced API load. Many of those API requests will be from runtimes that don't care about artifacts. And a lot of the artifact tooling may be run separate from the image pull, and could go directly to the referrers API if it has the digest, skipping the manifest API.

@toddysm
Copy link

toddysm commented Aug 3, 2023

I like the optimization part but one of the concerns I have with the absent part is for how long this will be true (i.e. will need to be cached on the client). In a scenario where you have a consumer of an artifact and producer that pushes referrers async this cache time will be arbitrary because the state may change right after the artifact manifest is retrieved.

@mtrmac
Copy link
Author

mtrmac commented Aug 3, 2023

The way I was thinking about it, the header would make no promises about the future at all; it only represents state as is visible to the client “at the time of forming the response”.

I.e. in a
A. PUT /v2/<name>/manifests/<reference> (subject)
B. GET /v2/<name>/manifests/<reference> (subject)
C. /v2/<name>/referrers/<digest>
D. PUT /v2/<name>/manifests/<reference> (referrer)

A necessarily happens-before B.

If C is racing D (i.e. there is no D happens-before C relationship), a client that sees absent in B, without other synchronization could have issued C before D happens.

I.e. adding the absent header does not add any new race, it just turns a race of C vs. D into a race of B vs. D.

@sajayantony
Copy link
Member

sajayantony commented Aug 3, 2023

Curious to hear if from other who know the http spec better -
Are we ok using the Link header https://datatracker.ietf.org/doc/html/rfc5988#section-5.3
There are a set of previously registred types and not sure if we can pick something like rel=related - https://www.iana.org/assignments/link-relations/link-relations.xhtml

For example consider something like

Content-Length: 708
Docker-Distribution-Api-Version: registry/2.0
Etag: "sha256:..."
Docker-Content-Digest: sha256:...
Content-Type: application/vnd.oci.image.manifest.v1+json
Link: <http://localhost:5000/v2/hello-world/referrers/sha256:3e207b409db364b595ba862cdc12be96dcdad8e36c59a03b7b3b61c946a5742a>; rel="related"

Also if we don't want to reuse related we could just register another relation type.

@mtrmac
Copy link
Author

mtrmac commented Aug 3, 2023

The primary benefit of this proposal is to report when there is nothing to link; that can’t be presented with a Link header.

@sudo-bmitch
Copy link
Contributor

I agree that caching is a client side concern and this just needs to return the current state at the time of the manifest get. A client cache is always at risk of becoming stale, particularly with any content not covered by content addressability (tags, headers, referrers list, and whether content exists). We can leave that as an implementation problem.

I'm ready to move forward with the absent header, but I'll give others time to weigh in on whether the other headers would add value to their use cases. Since this is an optimization, I'm comfortable saying OCI-Referrers: absent is a SHOULD, rather than a MUST, and adding it late in the release cycle is not a concern to me.

@sudo-bmitch sudo-bmitch added this to the v1.1.0 milestone Aug 3, 2023
@jcarter3
Copy link

jcarter3 commented Aug 11, 2023

I understand the desire for this, but it's a feature that benefits the client while adding a fair amount of burden on the server. I can see the SHOULD being treated more like a MAY in practice, reducing it's usefulness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants