-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Implement package fetch retries #25120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
It might be useful to look at the defaults of other package managers for ideas - the cargo source has some good info about retries, including links to other resources: https://doc.rust-lang.org/nightly/nightly-rustc/src/cargo/util/network/retry.rs.html Some notes:
|
|
@ehaas Yeah, I also took a look at the I like that list of notes you jotted down though, so I'll start there. |
2d62411 to
58fbf12
Compare
|
Has anyone heard of a need/desire for the Zig HTTP client to have retry logic built into it? If not, I'll continue implementing the retry logic in the |
|
These are the errors I have decided are spurious: fn maybe_spurious(err: std.http.Client.RequestError) bool {
switch (err) {
// FROM: ConnectTcpError
// tcp errors
error.ConnectionTimedOut,
error.ConnectionResetByPeer,
error.ConnectionRefused,
error.NetworkUnreachable,
error.AddressInUse,
error.TemporaryNameServerFailure,
error.SystemResources,
Allocator.Error.OutOfMemory,
// FROM: ReceiveHeadError
// io errors
error.WriteFailed,
error.ReadFailed,
// http errors
error.HttpRequestTruncated,
error.HttpConnectionClosing,
error.HttpChunkTruncated,
error.HttpChunkInvalid,
// retrying proxy would be included in this list, since proxy failure returns same errors as normal failures
// still trying to find where equivalent EOF error might be...
=> return true,
}
return false;
}I mostly used gitoxide (used by cargo) cargo as a reference, plus my own judgement. Many of the TCP errors were pretty easy to map from gitoxide, and the ones gitoxide chose were spurious made sense to me. The HTTP errors, on the other hand, did not map well from cargo, which is really curl under the hood. I found several of the errors that curl had implemented don't map well to the way we do things. We just have a different implementation of things, which results in different things that can go wrong. So I derived what I could from cargo + curl, and used my best judgement for the rest. As the comment in the code above shows, the only thing I need to investigate is how to detect an Zig equivalent of the curl EOF error. I'm working on this as often as I can, but sometimes that's as infrequently as once per week, so it's hard for me to move fast. Plus I'm learning a lot about low-level networking, which doesn't help. That's all for now |
There are some locations where memory is being freed that wasn't being freed before. This is because, in the case of the user possibly configuring `Fetch` to retry many times, I don't want the allocator to allocate without freeing each time it attempts to refetch the network data, resulting in a memory leak.
`ConnectTcpError` is a subset of `ConnectError`, but the 4 additional
errors that `ConnectError` comes with:
```zig
error{
UnsupportedUriScheme,
UriMissingHost,
UriHostTooLong,
CertificateBundleLoadFailure,
}
```
...are never returned by `Client.connect()`. So we are communicating
that this method could return 1 of those 4 types, when in reality, it
never can. This commit fixes this communication error.
The two error types look like this, side by side:
```zig
pub const RequestError = ConnectTcpError || error{
UnsupportedUriScheme,
UriMissingHost,
UriHostTooLong,
CertificateBundleLoadFailure,
};
pub const ConnectError = ConnectTcpError || RequestError;
```
Once evaluated, then end up containing the exact same error types, so it
is redundant to have them both. I chose to keep `RequestError` because
it is used slightly more and also so the `Client.request()` method gets
an error return type that matches the name of the method.
efc885c to
8b08f0b
Compare
|
I still need to test it, but jitter is implemented, spurious error checks are implemented, and it does compile. The only thing I need to implement now is reading the "Retry-After" header |
2ad8daa to
7416a9d
Compare
|
I implemented reading and parsing the Retry-After header! Some notes:
All I need to do now is figure out how to unit test the |
3f749af to
91adcae
Compare
| port: u16, | ||
| protocol: Protocol, | ||
| ) ConnectError!*Connection { | ||
| ) ConnectTcpError!*Connection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConnectError is the same as RequestError. Both of which are a larger collection of errors than ConnectTcpError is. But this function only returns errors found in ConnectTcpError, so this change will make the possible returned error types more clear.
| }; | ||
| } | ||
|
|
||
| pub const ConnectError = ConnectTcpError || RequestError; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConnectError is same as RequestError. Is deleting ConnectError the right move? Would it be better to leave it in with a doc comment about how it's being deprecated and what to use instead?
|
Ok it's ready for review again. Here are the key features, which is very similar to the list of features @ehaas commented earlier. Features
TestingI unit tested quite a bit of the new code, and the tests pass. I also tested this locally with and without network failures, while logging before and after the delays to see that the number matched with real delay. I also tested this with a local endpoint that includes a "Retry-After" header. All works really well. |
Problem
Like in #17472, the package manager does not retry fetching when communication to the server fails in a way that could possibly be resolved by retrying the fetch.
Solution
Automatically retry the fetch in these scenarios during http fetches:
and theses scenarios during git+http fetches:
This does not expose this fetch config to the end user, for reasons you can read about in the next section. Definitely worth the conversation, but getting this PR out there as is doesn't seem like a bad idea to me.
Major Considerations
retry_countandretry_delay_ms). However, this proved difficult, so it is not a part of this PR. Here are some thoughts I had though:build.zigfile, because (as far as I can tell) thebuild.zigfile is not run until all the packages are fetched and ready to be built with the source code..dependencies, like this (below) so that.retry_countand other props apply to all dependencies, but it's still a part of the.dependenciesobject. Also, this can easily be parsed by checking the type of the prop. If the type isanytype, it's a real dep, if it's anything else, it's config code, not a real dep. Fun idea, but it's not a part of this PR and shouldn't be. Here is an example of this idea:std.Progress.Nodefirst for that, so this is also not implemented in this PR:Minor Considerations
Fetchto retry many times, I don't want the allocator to allocate without freeing each time it attempts to refetch the network data, resulting in a memory leak.Fetch.initResource()method needs to be refactored. But this PR is big enough, so it made sense to me to merge this, then do a refactor in a future PR.