More fine-grained timeout handling #1041

otrosien · 2019-04-24T10:10:23Z

Is your feature request related to a problem? Please describe.

Our backend is expected to fail with a timeout after 450ms, but in case of massive traffic it starts garbage collecting pauses etc, so ideally skipper in front of my application should react and send a 504 Gateway Timeout to the client. Unfortunately there is only a global backend timeout configurable in skipper, and we're sharing skippers ingress with other applications.

Describe the solution you would like

Ideally I want to configure a timeout to be configurable per route, or per backend...

Describe alternatives you've considered (optional)

As a work-around we could invest in hardening the backend to ensure it fails faster, instead of trying to handle more connections and getting into GC issues.

Another alternative could be a dedicated skipper-ingress for our application that hard-codes our desired timeout setting.

Additional context (optional)

Why failing fast is important here? If we're not responding within the expected timeout, we're burdening our upstream clients with our increased latency, causing them to handle more concurrent connections, and making it harder for them to trigger their circuit breakers.

Would you like to work on it?

Maybe.

szuecs · 2019-04-24T11:20:21Z

Thanks for the issue. I think we could start to work on it if we do configurable timeouts per backend (group).

AlexanderYastrebov · 2020-10-22T11:11:30Z

I will post a comment instead of a new issue, it is about an edge case (infinitely slow backend) but is relevant to the subject.

Problem

TL;DR: There is no "response read timeout" available and therefore skipper hangs if backend hangs until client cancels the request.

Skipper has the following timeout configurations (look for *timeoutBackend*) available regarding backend:

skipper/config/config.go

Lines 178 to 183 in 5a92e9f

    
           TimeoutBackend               time.Duration `yaml:"timeout-backend"` 
        
           KeepaliveBackend             time.Duration `yaml:"keepalive-backend"` 
        
           EnableDualstackBackend       bool          `yaml:"enable-dualstack-backend"` 
        
           TlsHandshakeTimeoutBackend   time.Duration `yaml:"tls-timeout-backend"` 
        
           ResponseHeaderTimeoutBackend time.Duration `yaml:"response-header-timeout-backend"` 
        
           ExpectContinueTimeoutBackend time.Duration `yaml:"expect-continue-timeout-backend"`

skipper/config/config.go

Lines 386 to 391 in 5a92e9f

    
           timeoutBackendUsage               = "sets the TCP client connection timeout for backend connections" 
        
           keepaliveBackendUsage             = "sets the keepalive for backend connections" 
        
           enableDualstackBackendUsage       = "enables DualStack for backend connections" 
        
           tlsHandshakeTimeoutBackendUsage   = "sets the TLS handshake timeout for backend connections" 
        
           responseHeaderTimeoutBackendUsage = "sets the HTTP response header timeout for backend connections" 
        
           expectContinueTimeoutBackendUsage = "sets the HTTP expect continue timeout for backend connections"

For backend requests skipper uses http.Transport:

skipper/proxy/proxy.go

Lines 623 to 637 in 5a92e9f

    
           tr := &http.Transport{ 
        
           	DialContext: newSkipperDialer(net.Dialer{ 
        
           		Timeout:   p.Timeout, 
        
           		KeepAlive: p.KeepAlive, 
        
           		DualStack: p.DualStack, 
        
           	}).DialContext, 
        
           	TLSHandshakeTimeout:   p.TLSHandshakeTimeout, 
        
           	ResponseHeaderTimeout: p.ResponseHeaderTimeout, 
        
           	ExpectContinueTimeout: p.ExpectContinueTimeout, 
        
           	MaxIdleConns:          p.MaxIdleConns, 
        
           	MaxIdleConnsPerHost:   p.IdleConnectionsPerHost, 
        
           	IdleConnTimeout:       p.CloseIdleConnsPeriod, 
        
           	DisableKeepAlives:     p.DisableHTTPKeepalives, 
        
           	Proxy:                 proxyFromHeader, 
        
           }

TimeoutBackend is used to configure net.Dialer timeout:

    // Timeout is the maximum amount of time a dial will wait for
    // a connect to complete. If Deadline is also set, it may fail
    // earlier.
    //
    // The default is no timeout.
    //
    // When using TCP and dialing a host name with multiple IP
    // addresses, the timeout may be divided between them.
    //
    // With or without a timeout, the operating system may impose
    // its own earlier timeout. For instance, TCP timeouts are
    // often around 3 minutes.
    Timeout time.Duration

ResponseHeaderTimeoutBackend is used to configure http.Transport timeout:

    // ResponseHeaderTimeout, if non-zero, specifies the amount of
    // time to wait for a server's response headers after fully
    // writing the request (including its body, if any). This
    // time does not include the time to read the response body.
    ResponseHeaderTimeout time.Duration // Go 1.1

Since skipper uses http.Transport and not http.Client there is no "response read timeout" available such that http.Client has:

    // Timeout specifies a time limit for requests made by this
    // Client. The timeout includes connection time, any
    // redirects, and reading the response body. The timer remains
    // running after Get, Head, Post, or Do return and will
    // interrupt reading of the Response.Body.
    //
    // A Timeout of zero means no timeout.
    //
    // The Client cancels requests to the underlying Transport
    // as if the Request's Context ended.
    //
    // For compatibility, the Client will also use the deprecated
    // CancelRequest method on Transport if found. New
    // RoundTripper implementations should use the Request's Context
    // for cancellation instead of implementing CancelRequest.
    Timeout time.Duration // Go 1.3

If backend has started to respond and has send response status and then hangs,
skipper will forward response status to the client and hang as well (in copyStream):

skipper/proxy/proxy.go

Lines 1225 to 1227 in 5a92e9f

    
           ctx.responseWriter.WriteHeader(ctx.response.StatusCode) 
        
           ctx.responseWriter.Flush() 
        
           err := copyStream(ctx.responseWriter, ctx.response.Body, p.tracing, ctx.proxySpan)

until client cancels its request.

Consider malformed backend that writes status and hangs:

package main

import (
	"fmt"
	"log"
	"net/http"
	"time"
)

func main() {
	http.HandleFunc("/slow", func(w http.ResponseWriter, r *http.Request) {

		w.WriteHeader(418)

		f := w.(http.Flusher)
		f.Flush()

		time.Sleep(10 * time.Hour)
	})

	fmt.Printf("Starting server at port 8080\n")
	if err := http.ListenAndServe(":8080", nil); err != nil {
		log.Fatal(err)
	}
}

$ go run ~/tmp/slow.go
Starting server at port 8080

and skipper with all timouts set to 1s:

$ bin/skipper -inline-routes='* -> "http://localhost:8080"' -read-timeout-server=1s -read-header-timeout-server=1s -write-timeout-server=1s -timeout-backend=1s -response-header-timeout-backend=1s
[APP]INFO[0000] Expose metrics in codahale format            
[APP]INFO[0000] support listener on :9911                    
[APP]INFO[0000] proxy listener on :9090                      
[APP]INFO[0000] TLS settings not found, defaulting to HTTP   
[APP]INFO[0000] route settings, reset, route: : * -> "http://localhost:8080" 
[APP]INFO[0000] route settings received                      
[APP]INFO[0000] route settings applied

and client:

$ curl -v localhost:9090/slow
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 9090 (#0)
> GET /slow HTTP/1.1
> Host: localhost:9090
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 418 I'm a teapot
< Date: Thu, 22 Oct 2020 10:45:56 GMT
< Server: Skipper
< Transfer-Encoding: chunked
<

The client receives status from skipper and hangs forever, none of the skipper configured timeouts are applicable (because all of them are for connection or the first bytes received/sent, not for the request-response whole roundtrip duration).

When client cancels request (^C), the following appears in skipper logs:

[APP]ERRO[0053] error while copying the response stream: context canceled 
127.0.0.1 - - [22/Oct/2020:12:45:56 +0200] "GET /slow HTTP/1.1" 418 0 "-" "curl/7.58.0" 47893 localhost:9090 - -

which comes from copyStream:

skipper/proxy/proxy.go

Lines 1227 to 1230 in 5a92e9f

    
           err := copyStream(ctx.responseWriter, ctx.response.Body, p.tracing, ctx.proxySpan) 
        
           if err != nil { 
        
           	p.metrics.IncErrorsStreaming(ctx.route.Id) 
        
           	p.log.Errorf("error while copying the response stream: %v", err)

Solution

Backend request should configure timeout similar to (note its a demo and context is not cancelled properly):

@@ -883,6 +881,11 @@ func (p *Proxy) makeBackendRequest(ctx *context) (*http.Response, *proxyError) {
                return nil, &proxyError{handled: true}
        }
 
+       tc, _ := stdlibcontext.WithTimeout(req.Context(), 5*time.Second)
+       //defer _()  // releases resources if slowOperation completes before timeout elapses
+
+       req = req.WithContext(tc)
+
        bag := ctx.StateBag()
        spanName, ok := bag[tracingfilter.OpenTracingProxySpanKey].(string)
        if !ok {

In the above setup copyStream will unblock after 5s:

[APP]ERRO[0009] error while copying the response stream: context deadline exceeded 
127.0.0.1 - - [22/Oct/2020:12:51:41 +0200] "GET /slow HTTP/1.1" 418 0 "-" "curl/7.58.0" 5000 localhost:9090 - -

There might be global timeout setting for all requests and per-route setting (i.e. this ticket subject).

Open questions

How would global "read timeout" setting play with infinite responses (streaming/long polling)?
Backend request timeout context must be cancelled after successful response (happy path, nothing hangs) i.e. after copyStream but currently there are no means to do that. It would be nice to implement generic hook for response write completion - it could be used to clean request timeout context as well as to measure real backend latency that includes response write - useful for Endpoint metadata #1559 and loadbalancer algorithms #557.
How request timeout context plays with transparent retry (should it be the same timeout per attempt or both attempts must fit into timeout)?

Links

* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body). See zalando#1041

* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body). See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>

* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body) See zalando#1041 Signed-off-by: Alexander Yastrebov <[email protected]>

* Adds `backendTimeout` filter to configure route backend timeout * Proxy sets up request context with configured timeout and responds with 504 status on timeout (note: if response streaming has already started it will be terminated, client will receive backend status and truncated response body) See #1041 Signed-off-by: Alexander Yastrebov <[email protected]>

szuecs · 2020-12-21T10:11:06Z

https://opensource.zalando.com/skipper/reference/filters/#backendtimeout filter can be configured per route

AlexanderYastrebov · 2021-05-07T11:01:33Z

Addition to the #1041 (comment)

The client receives status from skipper and hangs forever, none of the skipper configured timeouts are applicable

It turns out that -write-timeout-server applies if server sends body. Consider server that sends pings infinitely:

package main

import (
	"fmt"
	"log"
	"net/http"
	"time"
)

func main() {
	http.HandleFunc("/slow", func(w http.ResponseWriter, r *http.Request) {

		w.WriteHeader(418)

		f := w.(http.Flusher)
		f.Flush()

		ticker := time.NewTicker(5 * time.Second)
		for {
			select {
			case <-ticker.C:
				_, err := fmt.Fprintf(w, "ping\n\n")
				if err != nil {
					fmt.Printf("Done\n")
					return
				}
				f.Flush()
			}
		}
	})

	fmt.Printf("Starting server at port 8080\n")
	if err := http.ListenAndServe(":8080", nil); err != nil {
		log.Fatal(err)
	}
}

$ go run ~/tmp/slowping.go 
Starting server at port 8080

and Skipper with all timeouts set to 1s:

$ bin/skipper -inline-routes='* -> "http://localhost:8080"' -read-timeout-server=1s -read-header-timeout-server=1s -write-timeout-server=1s -timeout-backend=1s -response-header-timeout-backend=1s

and the client:

*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 9090 (#0)
> GET /slow HTTP/1.1
> Host: localhost:9090
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 418 I'm a teapot
< Date: Fri, 07 May 2021 10:39:47 GMT
< Server: Skipper
< Transfer-Encoding: chunked
< 
* transfer closed with outstanding read data remaining
* stopped the pause stream!
* Closing connection 0
curl: (18) transfer closed with outstanding read data remaining

The client receives status from Skipper and Skipper closes the connection:

[APP]ERRO[0139] error while copying the response stream: context canceled 
127.0.0.1 - - [07/May/2021:12:40:54 +0200] "GET /slow HTTP/1.1" 418 6 "-" "curl/7.58.0" 5002 localhost:9090 - -

If -write-timout-server is set to 10s the client receives the first ping and then Skipper closes the connection:

$ bin/skipper -inline-routes='* -> "http://localhost:8080"' -read-timeout-server=1s -read-header-timeout-server=1s -write-timeout-server=10s -timeout-backend=1s -response-header-timeout-backend=1s
[APP]INFO[0000] Expose metrics in codahale format            
[APP]INFO[0000] support listener on :9911                    
[APP]INFO[0000] proxy listener on :9090                      
[APP]INFO[0000] TLS settings not found, defaulting to HTTP   
[APP]INFO[0000] route settings, reset, route: : * -> "http://localhost:8080" 
[APP]INFO[0000] route settings received                      
[APP]INFO[0000] route settings applied                       
[APP]ERRO[0013] error while copying the response stream: context canceled 
127.0.0.1 - - [07/May/2021:12:42:17 +0200] "GET /slow HTTP/1.1" 418 12 "-" "curl/7.58.0" 10003 localhost:9090 - -

$ curl -v localhost:9090/slow
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 9090 (#0)
> GET /slow HTTP/1.1
> Host: localhost:9090
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 418 I'm a teapot
< Date: Fri, 07 May 2021 10:42:17 GMT
< Server: Skipper
< Transfer-Encoding: chunked
< 
ping

* transfer closed with outstanding read data remaining
* stopped the pause stream!
* Closing connection 0
curl: (18) transfer closed with outstanding read data remaining

backendTimeout

backendTimeout filter implemented by #1566 can only reduce the timeout with regards to -write-timeout-server:

$ bin/skipper -inline-routes='* -> backendTimeout("5s") -> "http://localhost:8080"' -read-timeout-server=1s -read-header-timeout-server=1s -write-timeout-server=10s -timeout-backend=1s -response-header-timeout-backend=1s
[APP]INFO[0000] Expose metrics in codahale format            
[APP]INFO[0000] support listener on :9911                    
[APP]INFO[0000] proxy listener on :9090                      
[APP]INFO[0000] TLS settings not found, defaulting to HTTP   
[APP]INFO[0000] route settings, reset, route: : * -> backendTimeout("5s") -> "http://localhost:8080" 
[APP]INFO[0000] route settings received                      
[APP]INFO[0000] route settings applied                       
[APP]ERRO[0009] error while copying the response stream: context deadline exceeded 
127.0.0.1 - - [07/May/2021:12:48:58 +0200] "GET /slow HTTP/1.1" 418 0 "-" "curl/7.58.0" 5000 localhost:9090 - -

$ curl -v localhost:9090/slow
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 9090 (#0)
> GET /slow HTTP/1.1
> Host: localhost:9090
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 418 I'm a teapot
< Date: Fri, 07 May 2021 10:48:58 GMT
< Server: Skipper
< Transfer-Encoding: chunked
< 
* Connection #0 to host localhost left intact

If it is configured above then -write-timeout-server applies:

$ bin/skipper -inline-routes='* -> backendTimeout("60s") -> "http://localhost:8080"' -read-timeout-server=1s -read-header-timeout-server=1s -write-timeout-server=10s -timeout-backend=1s -response-header-timeout-backend=1s
[APP]INFO[0000] Expose metrics in codahale format            
[APP]INFO[0000] support listener on :9911                    
[APP]INFO[0000] proxy listener on :9090                      
[APP]INFO[0000] TLS settings not found, defaulting to HTTP   
[APP]INFO[0000] route settings, reset, route: : * -> backendTimeout("60s") -> "http://localhost:8080" 
[APP]INFO[0000] route settings received                      
[APP]INFO[0000] route settings applied                       
[APP]ERRO[0014] error while copying the response stream: context canceled 
127.0.0.1 - - [07/May/2021:12:50:38 +0200] "GET /slow HTTP/1.1" 418 12 "-" "curl/7.58.0" 10003 localhost:9090 - -

$ curl -v localhost:9090/slow
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 9090 (#0)
> GET /slow HTTP/1.1
> Host: localhost:9090
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 418 I'm a teapot
< Date: Fri, 07 May 2021 10:50:38 GMT
< Server: Skipper
< Transfer-Encoding: chunked
< 
ping

* transfer closed with outstanding read data remaining
* stopped the pause stream!
* Closing connection 0
curl: (18) transfer closed with outstanding read data remaining

I.e. backendTimeout filter may only be used to reduce -write-timeout-server (60s by default)

skipper/config/config.go

Line 256 in 7783c31

defaultWriteTimeoutServer = 60 * time.Second

SergeiKalachev · 2021-05-07T12:20:04Z

Hello.

I have another use case where I need to configure timeouts per route/backend.
The original discussion started here.
Here is some copy-paste from the thread that explains the problem:

In our app I'm setting up an alternative to web-sockets which is SSE.
Locally everything works smoothly and as expected but in production my connections do not remain open (but they should). The connections are closed each minute. I tried running the local instance of skipper and was able to reproduce the problem.

I run:
./bin/skipper -address :3003 -routes-file=example.eskip -experimental-upgrade -insecure
example.eskip:
r: * -> "https://localhost:3000"
Then I call my backend through skipper http:localhost:3003.
I see browser establishes the connection, it remains open only for one minuet and then closes. The logs from skipper:
[APP]ERRO[5034] error while copying the response stream: context canceled 
::1 - - [06/May/2021:16:09:05 +0200] "GET /events/assignments HTTP/1.1" 200 13 "http://localhost:3003/" "Mozilla/5.0 (Macintosh; 
Could you please help me to configure SSE feature properly?

and

Ok, I found the guy, it is -write-timeout-server
My guess it's skipper because when I call my local BE directly connection remains open. When I enable skipper my connection breaks after a minute. When I enable skipper with flag -write-timeout-server=3m it remains open.

With the help of @AlexanderYastrebov I figured out the timeout I need to adjust, it is -write-timeout-server. If I set it to, let's say, 5m then connection remains for 5 mins.
@AlexanderYastrebov also confirmed (see #1041 (comment)) that it's -write-timeout-server.

My question is, is it possible to create new skipper filter or adjust existing one backend-timeout that would allow to change the value for -write-timeout-server?

@szuecs @aryszka folks what do you think?

szuecs · 2021-05-08T15:50:59Z

As far as I remember it's not easy to change it from filters. I am back end of May and can check it.
@AlexanderYastrebov maybe you can check if we have the queue listener accessible from proxy. If so we could add a func to change it via FilterContext, not sure if it makes a lot of sense. The problem is that we have to increase it to max (for all routes and all connections) and rely on backendTimeout for earlier timeouts. Out of my head this could lead to slowloris attack vector in case body is slowly send on purpose.

jrusso1020 · 2021-05-19T19:48:27Z

Took me forever to find this issue, but I believe we are encountering the same issue with a SSE stream behind skipper. Full issue in the Backstage repo describing it backstage/backstage#5535

I also think it would be great to to create a new skipper filter or adjust existing one backend-timeout that would allow to change the value for -write-timeout-server for a given route as opposed for everything. Since as stated there are security concerns with setting -write-timeout-server to high and opening an attack vector for resource exhaustion

szuecs · 2021-05-20T12:59:22Z

@jrusso1020 I will check the references issue, thanks for pointing it out.
The -write-timeout-server can't be changed by route. It's based on the Listener socket which does not know any routes (L4 vs L7 OSI layer).

jrusso1020 · 2021-05-20T14:24:22Z

@szuecs gotcha, just curious are there any other recommendations on how to configure skipper to handle long lived (over 1 minute) SSE streams? besides increasing -write-timeout-server

szuecs · 2021-05-20T17:15:34Z

It depends on where the timeout hits us. The problem is that Go does not expose something that we can use for this kind of decision. Maybe just decide that some duration like 15m or 1h is fine as timeout?
You could also decide to use 0 and disable it. Then you can use the backendTimeout() filter to add route based configurations

jrusso1020 · 2021-05-27T18:55:05Z

thanks for opening up #1782 @szuecs ! We are still observing some issues with SSE, let me know if you'd prefer I open a new issue or move this convo to #1782

We've increased -write-timeout-server to 300s (5 minutes) which is well below how long the stream takes, normally around 1 -2 minutes.

However, we are still observing the issue. Getting error while copying the response stream: context canceled. I've also noticed that this occurs even in some "successful" cases. What I mean by that is the stream can take 1.1 to 1.2 minutes and we receive all the logs/messages looking successful but the error while copying the response stream: context canceled will appear. Anything under 1 minute though we don't see error while copying the response stream: context canceled

I happened upon #1748 this issue which seems rather similar as well, and they solved it by setting -response-header-timeout-backend as well as -write-timeout-server. So we tried that as well, however still encountering the issue

Logs as you can see the error while copying the response stream: context canceled appears a minute (or there about) after the eventstream was started (eventstream endpoint in logs is the SSE stream)

Our skipper image registry.opensource.zalan.do/teapot/skipper:v0.13.51
Our config

      skipper
      -kubernetes
      -kubernetes-in-cluster
      -address=:9999
      -proxy-preserve-host
      -serve-host-metrics
      -metrics-flavour=prometheus
      -enable-ratelimits
      -experimental-upgrade
      -enable-route-lifo-metrics
      -metrics-exp-decay-sample
      -kubernetes-https-redirect=true
      -lb-healthcheck-interval=3s
      -enable-connection-metrics
      -reverse-source-predicate
      -kubernetes-ingress-class=skipper-ingress-internal
      -write-timeout-server=300s
      -response-header-timeout-backend=300s

AlexanderYastrebov · 2021-05-27T23:11:45Z

@jrusso1020 Hi, if your Skipper is behind AWS LoadBalancer you may need to increase its idle timeout as well https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-idle-timeout.html (also mentioned in #1748 (comment))

jrusso1020 · 2021-06-01T20:26:45Z

Just to close this loop here, we are still seeing the error while copying the response stream: context canceled message in our logs

But from a user perspective we are no longer seeing the SSE streams in the browser being reset

szuecs · 2021-06-02T15:10:01Z

@jrusso1020 so do you retry from the client or do you configure the idle timeout correctly, such that ALB opens a new connection automatically?

jrusso1020 · 2021-06-02T15:17:54Z

@szuecs
We added the below to the ingress

-write-timeout-server=300s
-response-header-timeout-backend=300s

and the below to the kube-ingress-aws-controller controller

--idle-connection-timeout=300s

No additional code changes on our applications

So increasing the idle timeout means that the ALB will automatically open up new connections when the ingress closes them, as long as it's within that idle connection timeout and the client hasn't closed the connection?

AlexanderYastrebov · 2021-06-02T15:20:56Z

IMO LB idle timeout should be above Skipper timeouts. From Skipper point of view load balancer is the client that closes connection after 5 minutes. If it happens just before skipper timeout it would manifest itself as context canceled error in the logs.

szuecs · 2021-06-02T18:06:38Z

@jrusso1020 also set the idle timeout +2s higher than the ALB idle timeout. Once I investigated a rarely happened issue that returned 500 for some reason and it was a race condition on idle timeouts and a connect in flight.

jrusso1020 · 2021-06-02T18:10:50Z

@szuecs can you clarify what is the "idle timeout" you recommend setting 2s higher than the ALB idle timeout? (which flag/deployment should I do that for)

szuecs · 2021-06-02T18:17:57Z

skipper's idle timeout should be +2s bigger than the ALB idle timeout. Including @AlexanderYastrebov 's answer, for example skipper:

-write-timeout-server=300s
-response-header-timeout-backend=300s
-idle-timeout-server=304s

and the below to the kube-ingress-aws-controller controller

--idle-connection-timeout=302s

AlexanderYastrebov · 2022-09-20T20:18:10Z

Ideally I want to configure a timeout to be configurable per route, or per backend...

net/http: no way of manipulating timeouts in Handler golang/go#16100
net/http: ResponseController to manipulate per-request timeouts (and other behaviors) golang/go#54136

szuecs · 2022-09-23T18:45:12Z

Let's reopen to be aware of a possible change that we might be able to enhance the current timeouts in the future if the implementation of the proposal (not sure if this proposal is actually possible to use for us) is available.

szuecs · 2023-06-08T17:05:44Z

We have added https://opensource.zalando.com/skipper/reference/filters/#readtimeout and https://opensource.zalando.com/skipper/reference/filters/#writetimeout a while ago, that you can set from the filter via http.ResponseController per route.
I think it's fine to close this now.

szuecs added the enhancement label Apr 26, 2019

AlexanderYastrebov mentioned this issue Oct 19, 2020

Simplifies makeBackendRequest #1556

Closed

AlexanderYastrebov mentioned this issue Oct 23, 2020

Implements route backend timeout #1566

Merged

szuecs closed this as completed Dec 21, 2020

jrusso1020 mentioned this issue May 19, 2021

Task Eventstreams in the scaffolder not being cleaned up properly backstage/backstage#5535

Closed

szuecs mentioned this issue May 26, 2021

Document how to run skipper for SSE backends #1782

Open

szuecs reopened this Sep 23, 2022

szuecs closed this as completed Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More fine-grained timeout handling #1041

More fine-grained timeout handling #1041

otrosien commented Apr 24, 2019 •

edited

Loading

szuecs commented Apr 24, 2019

AlexanderYastrebov commented Oct 22, 2020 •

edited

Loading

szuecs commented Dec 21, 2020

AlexanderYastrebov commented May 7, 2021 •

edited

Loading

SergeiKalachev commented May 7, 2021

szuecs commented May 8, 2021

jrusso1020 commented May 19, 2021 •

edited

Loading

szuecs commented May 20, 2021

jrusso1020 commented May 20, 2021

szuecs commented May 20, 2021 •

edited

Loading

jrusso1020 commented May 27, 2021 •

edited

Loading

AlexanderYastrebov commented May 27, 2021

jrusso1020 commented Jun 1, 2021

szuecs commented Jun 2, 2021

jrusso1020 commented Jun 2, 2021

AlexanderYastrebov commented Jun 2, 2021

szuecs commented Jun 2, 2021

jrusso1020 commented Jun 2, 2021

szuecs commented Jun 2, 2021

AlexanderYastrebov commented Sep 20, 2022

szuecs commented Sep 23, 2022 •

edited

Loading

szuecs commented Jun 8, 2023

More fine-grained timeout handling #1041

More fine-grained timeout handling #1041

Comments

otrosien commented Apr 24, 2019 • edited Loading

szuecs commented Apr 24, 2019

AlexanderYastrebov commented Oct 22, 2020 • edited Loading

Problem

Solution

Open questions

Links

szuecs commented Dec 21, 2020

AlexanderYastrebov commented May 7, 2021 • edited Loading

backendTimeout

SergeiKalachev commented May 7, 2021

szuecs commented May 8, 2021

jrusso1020 commented May 19, 2021 • edited Loading

szuecs commented May 20, 2021

jrusso1020 commented May 20, 2021

szuecs commented May 20, 2021 • edited Loading

jrusso1020 commented May 27, 2021 • edited Loading

AlexanderYastrebov commented May 27, 2021

jrusso1020 commented Jun 1, 2021

szuecs commented Jun 2, 2021

jrusso1020 commented Jun 2, 2021

AlexanderYastrebov commented Jun 2, 2021

szuecs commented Jun 2, 2021

jrusso1020 commented Jun 2, 2021

szuecs commented Jun 2, 2021

AlexanderYastrebov commented Sep 20, 2022

szuecs commented Sep 23, 2022 • edited Loading

szuecs commented Jun 8, 2023

otrosien commented Apr 24, 2019 •

edited

Loading

AlexanderYastrebov commented Oct 22, 2020 •

edited

Loading

AlexanderYastrebov commented May 7, 2021 •

edited

Loading

jrusso1020 commented May 19, 2021 •

edited

Loading

szuecs commented May 20, 2021 •

edited

Loading

jrusso1020 commented May 27, 2021 •

edited

Loading

szuecs commented Sep 23, 2022 •

edited

Loading