-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
since controller 1.10.0 (chart 4.10.0): ingress rejects duplicate "Transfer-Encoding: chunked" and returns 502 #11162
Comments
/remove-kind bug
Its better if you write a step-by-step procedure that someone can copy/paste from. Please use the image httpbun.org as it will get people on the same page. Please use minikube to reproduce the problem. Ensure you provide all manifests that someone can copy/paste and reproduce. /kind support |
@longwuyuan |
@AlexPokatilov thanks for updating.
|
Do you happen to have a sample of a ticket where someone reproduced a bug that was based on the server's result? Part of the reproducing-setup would be to have a backend behind ingress to respond a certain way to showcase the setup in a minikube. I could create a spring-boot app that does that & create an image locally & deploy that, but that sounds like a lot of overhead for someone to reproduce that does not happen to have mvn&jdk locally |
Also you will see, there are no logs in the issue description. So essentially the info is already asked in issue template but I will list again (Not YAML but current state of resources shown as kubectl describe output and also the logs of the controller pod)
|
I will get back to you tomorrow and supply missing info. Sorry for that. I have used the template but omitted some of the parts that seemed less relevant but obviously were more relevant than I initially thought. |
@DRoppelt thanks for the idea. By discussing here or by copy/pasting data, if we can triage the issue description to an aspect that is related to reverseproxy or ingress-api, it would help make progress. If we make it a maven/jdk/jre/jsp specific discussion, then later in the discussion we will arrive at a step where we pinpoint at what reverseproxy concept or what nginx-as-a-reverseproxy concept we are looking at. In addition to what K8S-Ingress-API related problem are we looking at. In this case, it seems you are pointing at your java server setting I am wondering why should nginx reverseproxy bother what value you set to that header. We don't have any business messing with that. I don't know if you meant you were setting that header in your code and in the springboot framework. Either way, i know for sure that the controller does not get in the way normally. Hence we need to deep dive. BUT a clear issue description comes first. And the specific test where we alter headers comes next. |
Ok, so since the upgrade to controller 1.10.0 (chart 4.10.0), we observe 502s (not always, but at least one case where we can reproduce it deterministically) here is an attempt to visualize:
given that
then nginx appears to intercept the original response of We also observe following log with matching timestamps
We found that when altering backend to set the header once (as the backend probably should anyways), the response goes through as resproducingI tried to setup minikube and get a sample going with a spring boot app, but it appears that minikube comes with controller 1.9.4 and I found no way to pin it to 1.10.0, please advice. Do you have a sample ticket with httpbun.org I could copy from? I have a hard time understanding how to setup a sample based on that? |
Personally I am going to just use a nginx:alpine image to create a backend and see if I can just add the header to nginx.conf in that pod |
I see this ;
So it will be interesting to see your curl and same info. Trying to co-relate if nginx v1.25 does anything different . Also I read here that HTTP2 does not allow that header and I see HTTP/2 in play in my test above |
It is a bit too late (as in day of time in Europe) for me to get a feel for minikube, nevertheless thank you for the steps, will look at them myself tomorrow. Meanwhile, here is a backend that can be used for testing.
the api has 2 parameters, it will stream a response in 100ms intervals for I hope this helps!
|
@DRoppelt thank you. It really helped.
|
/kind bug @tao12345666333 @rikatz @cpanato @strongjz because of the bump to nginx v1.25.x, chunking related behaviour of the controller is broken out of the box. Details above. Basically internet search says a directive like Right now, the controller is rejecting response from a backend upstream app with the log message below ; 2024/03/26 03:39:34 [error] 267#267: *358731 upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream, client: 192.168.122.1, server: chunky-demo.dev.enjoydevops.com, request: "POST /chunky/10/1 HTTP/1.1", upstream: "http://10.244.0.68:8080/chunky/10/1", host: "chunky-demo.dev.enjoydevops.com"
|
Seems related #4838 |
@DRoppelt I think I have understood the problem and the solution
|
I think we need a docs PR to be clear about this to users of controller v1.10.x+ |
@DRoppelt thanks. Sure, here it is below.
|
appears to be the same issue that we were able to reproduce with longwuyuan. Duplicate transfere-encoding header. The ingress pod should also show a time-wise-matching error-log entry. In our services, we had tomcat/boot setting the header & application also setting this header in addition. I looked into this initially
but since I have found no corresponding ConfigMap entry (https://docs.nginx.com/nginx-ingress-controller/configuration/global-configuration/configmap-resource/), opted to fix the affected services instead of working with config-snippets. I am curious to see how this bug continues, but opted to fix that particular backend either way. |
@DRoppelt thanks. I see the duplication.
Thanks! |
I agree on all terms and thanks for bringing up the RFC. https://datatracker.ietf.org/doc/html/rfc2616#section-4.2
I would like a maintainer to chime in here and get a different perspective. The link that @longwuyuan shared (https://nginx.org/en/docs/faq/chunked_encoding_from_backend.html) appears to to deal about something nginx before 1.1.4 regarding:
the flipped default since 1.24, it seems
|
My take on summary so far ;
compose.yaml
nginx.Dockerfile
nginx.default.conf
nginx.Dockerfile
|
I kept mentioning other services that do not explicitly deal with this header, I now found how they also face nginx error-ing and returning 502 on their behalf. It is not a different header or such. Failing with the same error but with a different use-case causing that. The backends call further downstream systems and return their response 1:1 upstream, in a way acting as a proxy themselves
A calls B (3) and receive background for this setup: backendA takes the request (2) and is doing some request-transformation, looking up values in a DB and creates a new request (3), but then just shovels over (4) back to the client (5) without any transformation Which is a somewhat odd behaviour (especially taking all returned headers in 4 and pumping them through to (5), but a somewhat reasonable usecase. This here specifies
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers#hop-by-hop_headers
Which gives this whole thing a bit of a "twist" for the nginx, why it should even care about that header to the point it rejects the request. Anyways, all nginx sources/questions seem to point to "just set Proposals: a) create a 1.10.1 that statically sets this flag to create a forward compatible release b) create a chart that allows setting this value to "off" via values.yaml (but leave the default on "on" how the nginx project intended) c) in addition to b), add an option to control this setting via annotation on a per-ingress-config basis d) address the rejection of the request as a bug at nginx? |
I have new information ;
|
I believe https://nginx.org/en/docs/faq/chunked_encoding_from_backend.html to be misleading for this bug. It talks specifically about miss-represented payload
and nginx 1.1.14 to "fix" it
Which also is a version from 2011 (https://nginx.org/en/CHANGES-1.2) and I do not think this particular issue is relevant here.
in case of a Spring Boot application this is not true. Tomcat is an integral part of the deployment. it may be replaced by netty but a webserver of some sort must be part of the JVM and cannot be swapped with nginx. In k8s, nginx serves as an ingress to the pod and (at least for our usecase) terminates TLS and routes the traffic to pods. The tomcat takes the request and handles the translation from http to end up into "java code", somewhat simplified. The image than can be built from my provided sample would be promoted from dev all the way to prod and only be altered via configmaps. ingress-nginx shows a breaking behaviour from controller 1.9.6 to 1.10.0 and I think this needs to be addressed. Either to disable the |
@DRoppelt I agree that stick to v1.9.6 is not right.
Lets get on a call and discuss this.
|
@DRoppelt will look forward to know if we can get on a call on zoom sometime.
Ideally you don't need to change code in your git repo. I am hoping that you do this same test I describe above. Then please give your opinion if this test is valid for checking the "2 times setting header" . I wish we could get a nginx expert to comment on this |
Question and point here is: this seems to me an NGINX ( and not ingress controller code) breaking change to me. I'm not sure how many users are impacted, or as Long pointed this is specific for a situation on Tomcat/Jetty servers. My proposal is, while we don't default to the previous configuration (because there may be a reason NGINX turned that on, or broke it) we can allow this to be configurable. People migrating to ingress v1.10 can opt in to disabling chunking. I just don't know if this should be a global admin defined config (configmap) or a user defined config (annotation) Is this approach enough? |
COMMENT@rikatz yes, this approach seems enough for @DRoppelt as that is what he asked here #11162 (comment) . CAVEAT
UNCLEAR INFO
upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream
|
Test-Case 1
I am going to test setting chunking off in the controller, using server-snippets and update in next post |
Test-Case 2
|
Unclear-info is NOW cleared, after the tests shown in previous 2 posts
upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream
|
Older Contoller Works
|
Hi all, to me it seems the issue is caused by this change introduced in NGINX v1.23. (Src https://forum.nginx.org/read.php?2,297086,297090#msg-297090) This seems to be the expected behavior when the |
@mcharriere thank you very much for that info. Solves the mystery. |
Cool that is some helpful background! I was not able to come around sooner for the points you brought up @longwuyuan , but here we go now
Sorry about my tone that day. I felt frustrated at the time as I am currently dealing with an k8s setup with about 100 services where about 3 services are knowingly affected and an unknown amount may be affected but not yet identified. On the other hand, I am a paid person that can find a way to deal with it in my paid time and should not roll off all onto an open-source project. I will submit another testcase to show a behaviour that is rather hidden and unluckily has spread as a pattern in our code-base at across various services. It is hard to identify them and the "all or nothing" approach that nginx has (no chunking enabled, OK, chunking enabled hard reject with 502)
I was thinking about it a bit more and in essence the problem lays in this "all or nothing" approach from nginx without a way inbeween
It would be rather helpful to have a warn-mode that allows a soft mirgation for all identified services that do not stick to the expected behaviour nginx expects. Especially the part where I am not knowedgeable about the process of submitting a process or a bug at the nginx project. But this stance here last year seems very clear and without any room for wiggle https://forum.nginx.org/read.php?2,297086,297090#msg-297090
as for the test-cases: you can take the code unmodified and test a) b) c) service has at least the following annotation added and passes due to overriding
d) given that the chart has some config to set
at this point I am a bit undecided if this is just enough for this non-standard behavior. A reason against this is that the default of One can add a global snippet-annotation which allows one to opt-out of chunking. So only a combination of allowing a service to opt-in/-out and controller in combination would add value. Otherwise just disabling it on the controller globally via a snippet is already sufficient. Currently looking into the docs for the exact syntax I will post another update on the demo-service that show-cases another source of duplicate headers that is not as straight-foward as |
So here is a sample where the code is proxying another rest-endpoint and therefore gets a 2. header through the way it proxies. I added a feign-based client that proxies a call to another service For demo-purposes, the service hosts that endpoint too, but does not alter the demo
the backend has an endpoint where it does a DB lookup and then does a downstream call. But as an attempt to proxy the call back to upstream, does not re-wrap the response but just returns the downstream call 1:1
-> nginx will 502 this request. but when it actually re-wraps the downstream's response, the duplicate header is prevented
This demo also still has the
-> header once
-> |
the endpoint I think on our end, we will try to upgrade from nginx 1.9.6 to 1.10.0 with an added server-snippet on the controller that disable chunking transfer. Then we will, once fully rolled out, re-enabled chunking on testing environments while leaving it inactive on productive environments (and with that have a drift between prod vs test for some time) and slowly fix affected services. From my end, I think we could live without further adjustments except for a) a note/hightlight in the changelog for this breaking behaviour b) a sample of an added server-snipped in the controller's ConfigMap |
@DRoppelt I think its not been clear enough communication
|
Ah now I see. Ok, I tried to reproduce this on my end deployed in an AKS
Seems like the duplicate check happens either way. :( |
This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach |
me too |
What happened: when controller from 1.9.6 (chart 4.9.1) to 1.10.0 (chart 4.10.0), we observe ingress answering 502 on behalve of the service. When rolling back to 1.9.6, restored to healthty behaviour
What you expected to happen: ingress behaving similar to 1.9.6 and not answering 502 where it did not before
there appears to be a regression of some sort. We have updated from 1.9.6 to 1.10.0 and observed that some requests return 502 right when we upgraded. We then downgraded and saw the 502s drop to previous numbers.
One instance where we observe it is when setting
Transfer-Encoding: chunked
twice, once in code and once via Spring Boot.We also observe following error message in the logs
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): 1.25.3
Kubernetes version (use
kubectl version
): v1.28.5Environment:
uname -a
):Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
kubectl version
kubectl get nodes -o wide
helm ls -A | grep -i ingress
helm -n <ingresscontrollernamespace> get values <helmreleasename>
If helm was not used, then copy/paste the complete precise command used to install the controller, along with the flags and options used
if you have more than one instance of the ingress-nginx-controller installed in the same cluster, please provide details for all the instances
Current State of the controller:
kubectl describe ingressclasses
kubectl -n <ingresscontrollernamespace> get all -A -o wide
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
Current state of ingress object, if applicable:
kubectl -n <appnamespace> get all,ing -o wide
kubectl -n <appnamespace> describe ing <ingressname>
Others:
kubectl describe ...
of any custom configmap(s) created and in useHow to reproduce this issue:
we have a service that somestimes anwers with 2x
Transfer-Encoding: chunked
, sometimes with 1xTransfer-Encoding: chunked
, when it answers with 2x the header, the request does not reach the client and ingress answers with 502we hook into the ingress via port-forward to reproduce it, but have no better setup for local reproducing
<send requests via postman>
observe 502 & log when pod answers with duplicate header
Transfer-Encoding: chunked
, regular 200 when answering with one headerWe have also observed this for other clients with other backends, but so far have only particular endpoint reproduced with "always errors when X and always ok when Y"
Anything else we need to know:
we now have deployed a seperate ingress @ 1.10.0 (with seperate ingress class) and can observe this behaviour while our "hot" ingress that gets traffik is back to 1.9.6, the 1.10.0 still breaks while we have an operational ingress with 1.9.6. It does sound somewhat similar to spring-projects/spring-boot#37646
The text was updated successfully, but these errors were encountered: