-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for content range requests when getting blobs #537
Conversation
Does this need to be included in the dist-spec, vs delegating to the HTTP 1.1 RFC: https://datatracker.ietf.org/doc/html/rfc7233#section-3.1 |
If it is not included here, registry operators have no guidance on whether to implement this or not. We are explicitly stating that we are doing/requiring this for perf reasons, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we recommend behavior here, that should align with both the Notational Conventions of this specification and with the behavior described in RFC 9110.
spec.md
Outdated
@@ -190,6 +190,11 @@ If present, the value of this header MUST be a digest matching that of the respo | |||
|
|||
If the blob is not found in the registry, the response code MUST be `404 Not Found`. | |||
|
|||
A GET request may also include a `Range` request header to download part of a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A GET request may also include a `Range` request header to download part of a | |
A GET request MAY also include a `Range` request header to download part of a |
spec.md
Outdated
blob. The response code can either be `216 (Partial Content)` or `416 (Range | ||
Not Satisfiable)` in case of an invalid range. If the registry doesn't support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blob. The response code can either be `216 (Partial Content)` or `416 (Range | |
Not Satisfiable)` in case of an invalid range. If the registry doesn't support | |
blob. The response code SHOULD either be `216 (Partial Content)` or `416 (Range | |
Not Satisfiable)` in case of an invalid range. If the registry doesn't support |
spec.md
Outdated
@@ -190,6 +190,11 @@ If present, the value of this header MUST be a digest matching that of the respo | |||
|
|||
If the blob is not found in the registry, the response code MUST be `404 Not Found`. | |||
|
|||
A GET request may also include a `Range` request header to download part of a | |||
blob. The response code can either be `216 (Partial Content)` or `416 (Range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blob. The response code can either be `216 (Partial Content)` or `416 (Range | |
blob in accordance with [RFC 9110](https://datatracker.ietf.org/doc/html/rfc9110#name-range-requests). The response code can either be `216 (Partial Content)` or `416 (Range |
spec.md
Outdated
Not Satisfiable)` in case of an invalid range. If the registry doesn't support | ||
range requests, it can respond with `Accept-Ranges: none`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not Satisfiable)` in case of an invalid range. If the registry doesn't support | |
range requests, it can respond with `Accept-Ranges: none`. | |
Not Satisfiable)` in case of an invalid range. A registry MAY ignore the `Range` header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An older registry unaware of this change may ignore the header, but a newer registry may want to send a response back to discourage clients from sending the Range request.
ab2e9e1
to
64cb41e
Compare
We can only download as fast as the pipe allows of course, but maybe this helps with head-of-line blocking when transferring a large file by getting many smaller pieces in parallel - how many pieces? etc is tbd math. Also, perhaps more importantly, opens possibility of getting pieces from multiple registries in parallel. Furthermore, download failures more likely with large files are resumable. |
Updated as per comments in the OCI meeting 05/30/2024 Candidate for dist-spec v1.1.1? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the reword, it looks like a separate commit got pulled in by accident. Please fix with a rebase on main.
OCI artifacts support has landed in various OCI specs v1.1.0 which allows for arbitrary artifact types, small and large. Large artifacts (even existing container images) pose a particular challenge that: 1) it takes too long to download 2) it takes too long to unpack This PR begins to address 1) above. The client can initiate a HEAD request to get the size and later multiple GET range requests to download a blob in parallel. Signed-off-by: Ramkumar Chinchani <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Note that another reading of the spec found this: |
I've linked with #354. |
Just preliminary testing ... zot localhost over http with 10GiB artifact blob
{"level":"info","module":"http","component":"session","clientIP":"127.0.0.1:59934","method":"GET","path":"/v2/artifact/blobs/sha256:732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d","statusCode":200,"latency":"1m32s","bodySize":10737418240,"headers":{"Accept-Encoding":["gzip"],"User-Agent":["Go-http-client/1.1"]},"goroutine":167,"caller":"zotregistry.dev/zot/pkg/api/session.go:132","time":"2024-06-08T03:34:08.916403767Z","message":"HTTP API"}
{"level":"info","module":"http","component":"session","clientIP":"127.0.0.1:52134","method":"GET","path":"/v2/artifact/blobs/sha256:732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d","statusCode":206,"latency":"18s","bodySize":2147483649,"headers":{"Range":["bytes=6442450944-8589934592"],"User-Agent":["Go-http-client/1.1"]},"goroutine":184,"caller":"zotregistry.dev/zot/pkg/api/session.go:132","time":"2024-06-08T04:11:01.590093476Z","message":"HTTP API"}
{"level":"info","module":"http","component":"session","clientIP":"127.0.0.1:56086","method":"GET","path":"/v2/artifact/blobs/sha256:732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d","statusCode":206,"latency":"25s","bodySize":2147483649,"headers":{"Range":["bytes=0-2147483648"],"User-Agent":["Go-http-client/2.0"]},"goroutine":726149,"caller":"zotregistry.dev/zot/pkg/api/session.go:132","time":"2024-06-12T22:27:50.496794257Z","message":"HTTP API"} |
@shizhMSFT reported this for Azure. Thanks! Summary of findings that indeed parallel range requests does improve download performance however hits Azure infra limits fairly quickly. |
Here are the detailed data points for the above diagram.
The benchmark was generated by running Detailed conversation can be found in the CNCF Slack channel #oras: https://cloud-native.slack.com/archives/CJ1KHJM5Z/p1718311259363889 |
OCI artifacts support has landed in various OCI specs v1.1.0 which allows for arbitrary artifact types, small and large.
Large artifacts (even existing container images) pose a particular challenge that:
This PR begins to address 1. above.
The client can initiate a HEAD request to get the size and later multiple GET range requests to download a blob in parallel.
Fixes #354