-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More fine-grained timeout handling #1041
Comments
Thanks for the issue. I think we could start to work on it if we do configurable timeouts per backend (group). |
I will post a comment instead of a new issue, it is about an edge case (infinitely slow backend) but is relevant to the subject. ProblemTL;DR: There is no "response read timeout" available and therefore skipper hangs if backend hangs until client cancels the request. Skipper has the following timeout configurations (look for Lines 178 to 183 in 5a92e9f
Lines 386 to 391 in 5a92e9f
For backend requests skipper uses Lines 623 to 637 in 5a92e9f
Since skipper uses
If backend has started to respond and has send response status and then hangs, Lines 1225 to 1227 in 5a92e9f
until client cancels its request. Consider malformed backend that writes status and hangs: package main
import (
"fmt"
"log"
"net/http"
"time"
)
func main() {
http.HandleFunc("/slow", func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(418)
f := w.(http.Flusher)
f.Flush()
time.Sleep(10 * time.Hour)
})
fmt.Printf("Starting server at port 8080\n")
if err := http.ListenAndServe(":8080", nil); err != nil {
log.Fatal(err)
}
}
and skipper with all timouts set to 1s:
and client:
The client receives status from skipper and hangs forever, none of the skipper configured timeouts are applicable (because all of them are for connection or the first bytes received/sent, not for the request-response whole roundtrip duration). When client cancels request (
which comes from Lines 1227 to 1230 in 5a92e9f
SolutionBackend request should configure timeout similar to (note its a demo and context is not cancelled properly): @@ -883,6 +881,11 @@ func (p *Proxy) makeBackendRequest(ctx *context) (*http.Response, *proxyError) {
return nil, &proxyError{handled: true}
}
+ tc, _ := stdlibcontext.WithTimeout(req.Context(), 5*time.Second)
+ //defer _() // releases resources if slowOperation completes before timeout elapses
+
+ req = req.WithContext(tc)
+
bag := ctx.StateBag()
spanName, ok := bag[tracingfilter.OpenTracingProxySpanKey].(string)
if !ok { In the above setup
There might be global timeout setting for all requests and per-route setting (i.e. this ticket subject). Open questions
Links |
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body). See zalando#1041
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body). See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body). See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body). See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body). See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body). See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body). See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body). See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body). See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body). See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body) See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body) See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body) See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body) See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body) See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body) See #1041 Signed-off-by: Alexander Yastrebov <[email protected]>
* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body) See #1041 Signed-off-by: Alexander Yastrebov <[email protected]>
https://opensource.zalando.com/skipper/reference/filters/#backendtimeout filter can be configured per route |
Addition to the #1041 (comment)
It turns out that package main
import (
"fmt"
"log"
"net/http"
"time"
)
func main() {
http.HandleFunc("/slow", func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(418)
f := w.(http.Flusher)
f.Flush()
ticker := time.NewTicker(5 * time.Second)
for {
select {
case <-ticker.C:
_, err := fmt.Fprintf(w, "ping\n\n")
if err != nil {
fmt.Printf("Done\n")
return
}
f.Flush()
}
}
})
fmt.Printf("Starting server at port 8080\n")
if err := http.ListenAndServe(":8080", nil); err != nil {
log.Fatal(err)
}
}
and Skipper with all timeouts set to 1s:
and the client:
The client receives status from Skipper and Skipper closes the connection:
If
backendTimeout
If it is configured above then
I.e. Line 256 in 7783c31
|
Hello. I have another use case where I need to configure timeouts per route/backend.
and
With the help of @AlexanderYastrebov I figured out the timeout I need to adjust, it is My question is, is it possible to create new skipper filter or adjust existing one backend-timeout that would allow to change the value for |
As far as I remember it's not easy to change it from filters. I am back end of May and can check it. |
Took me forever to find this issue, but I believe we are encountering the same issue with a SSE stream behind skipper. Full issue in the Backstage repo describing it backstage/backstage#5535 I also think it would be great to to create a new skipper filter or adjust existing one backend-timeout that would allow to change the value for |
@jrusso1020 I will check the references issue, thanks for pointing it out. |
@szuecs gotcha, just curious are there any other recommendations on how to configure skipper to handle long lived (over 1 minute) SSE streams? besides increasing |
It depends on where the timeout hits us. The problem is that Go does not expose something that we can use for this kind of decision. Maybe just decide that some duration like 15m or 1h is fine as timeout? |
thanks for opening up #1782 @szuecs ! We are still observing some issues with SSE, let me know if you'd prefer I open a new issue or move this convo to #1782 We've increased However, we are still observing the issue. Getting I happened upon #1748 this issue which seems rather similar as well, and they solved it by setting Logs as you can see the Our skipper image
|
@jrusso1020 Hi, if your Skipper is behind AWS LoadBalancer you may need to increase its idle timeout as well https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-idle-timeout.html (also mentioned in #1748 (comment)) |
Just to close this loop here, we are still seeing the But from a user perspective we are no longer seeing the SSE streams in the browser being reset |
@jrusso1020 so do you retry from the client or do you configure the idle timeout correctly, such that ALB opens a new connection automatically? |
@szuecs
and the below to the
No additional code changes on our applications So increasing the idle timeout means that the ALB will automatically open up new connections when the ingress closes them, as long as it's within that idle connection timeout and the client hasn't closed the connection? |
IMO LB idle timeout should be above Skipper timeouts. From Skipper point of view load balancer is the client that closes connection after 5 minutes. If it happens just before skipper timeout it would manifest itself as |
@jrusso1020 also set the idle timeout +2s higher than the ALB idle timeout. Once I investigated a rarely happened issue that returned 500 for some reason and it was a race condition on idle timeouts and a connect in flight. |
@szuecs can you clarify what is the "idle timeout" you recommend setting 2s higher than the ALB idle timeout? (which flag/deployment should I do that for) |
skipper's idle timeout should be +2s bigger than the ALB idle timeout. Including @AlexanderYastrebov 's answer, for example skipper:
and the below to the kube-ingress-aws-controller controller
|
|
Let's reopen to be aware of a possible change that we might be able to enhance the current timeouts in the future if the implementation of the proposal (not sure if this proposal is actually possible to use for us) is available. |
We have added https://opensource.zalando.com/skipper/reference/filters/#readtimeout and https://opensource.zalando.com/skipper/reference/filters/#writetimeout a while ago, that you can set from the filter via http.ResponseController per route. |
Is your feature request related to a problem? Please describe.
Our backend is expected to fail with a timeout after 450ms, but in case of massive traffic it starts garbage collecting pauses etc, so ideally skipper in front of my application should react and send a 504 Gateway Timeout to the client. Unfortunately there is only a global backend timeout configurable in skipper, and we're sharing skippers ingress with other applications.
Describe the solution you would like
Ideally I want to configure a timeout to be configurable per route, or per backend...
Describe alternatives you've considered (optional)
As a work-around we could invest in hardening the backend to ensure it fails faster, instead of trying to handle more connections and getting into GC issues.
Another alternative could be a dedicated skipper-ingress for our application that hard-codes our desired timeout setting.
Additional context (optional)
Why failing fast is important here? If we're not responding within the expected timeout, we're burdening our upstream clients with our increased latency, causing them to handle more concurrent connections, and making it harder for them to trigger their circuit breakers.
Would you like to work on it?
Maybe.
The text was updated successfully, but these errors were encountered: