Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

since controller 1.10.0 (chart 4.10.0): ingress rejects duplicate "Transfer-Encoding: chunked" and returns 502 #11162

Open
DRoppelt opened this issue Mar 25, 2024 · 52 comments
Assignees
Labels
kind/deprecation Categorizes issue or PR as related to a feature/enhancement marked for deprecation. needs-priority triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@DRoppelt
Copy link

DRoppelt commented Mar 25, 2024

What happened: when controller from 1.9.6 (chart 4.9.1) to 1.10.0 (chart 4.10.0), we observe ingress answering 502 on behalve of the service. When rolling back to 1.9.6, restored to healthty behaviour

What you expected to happen: ingress behaving similar to 1.9.6 and not answering 502 where it did not before

there appears to be a regression of some sort. We have updated from 1.9.6 to 1.10.0 and observed that some requests return 502 right when we upgraded. We then downgraded and saw the 502s drop to previous numbers.

One instance where we observe it is when setting Transfer-Encoding: chunked twice, once in code and once via Spring Boot.

We also observe following error message in the logs

ingress-nginx-controller-8bf5b5f98-w8gbz 2024/03/22 11:01:54 [error] 575#575: *28680 upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream, client: 127.0.0.1, server: localhost, request: "POST /retrieve HTTP/1.1", upstream: "http://10.26.195.10:8080/retrieve", host: "localhost:8076"
ingress-nginx-controller-8bf5b5f98-w8gbz 127.0.0.1 - - [22/Mar/2024:11:01:54 +0000] "POST /retrieve HTTP/1.1" 502 150 "-" "PostmanRuntime/7.36.3" 3208 2.317 [default-ebilling-retrieval-http] ] 10.26.195.10:8080 0 2.318 502 94bc05d81342c91791fac0f02cb64434

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): 1.25.3

/etc/nginx $ /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.10.0
  Build:         71f78d49f0a496c31d4c19f095469f3f23900f8a
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.25.3

-------------------------------------------------------------------------------
/etc/nginx $

Kubernetes version (use kubectl version): v1.28.5

Environment:

  • Cloud provider or hardware configuration: Azure AKS
  • OS (e.g. from /etc/os-release): Ubuntu on nodes
/etc/nginx $ cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.19.1
PRETTY_NAME="Alpine Linux v3.19"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"
/etc/nginx $

  • Kernel (e.g. uname -a):
/etc/nginx $ uname -a
Linux ingress-nginx-controller-8bf5b5f98-w8gbz 5.15.0-1057-azure #65-Ubuntu SMP Fri Feb 9 18:39:24 UTC 2024 x86_64 Linux
/etc/nginx $

  • Install tools: AKS via terraform, ingress via helm (but also via terraform)
    • Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
  • Basic cluster related info:
    • kubectl version
$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVers
ion:"go1.20.5", Compiler:"gc", Platform:"windows/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.5", GitCommit:"506050d61cf291218dfbd41ac93913945c9aa0da", GitTreeState:"clean", BuildDate:"2023-12-23T00:10:25Z", GoVers
ion:"go1.20.12", Compiler:"gc", Platform:"linux/amd64"}
  • kubectl get nodes -o wide
$ kubectl get nodes -o wide
NAME                              STATUS   ROLES   AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-default-15207550-vmss000000   Ready    agent   7d2h   v1.28.5   10.26.194.4     <none>        Ubuntu 22.04.4 LTS   5.15.0-1057-azure   containerd://1.7.7-1
aks-pool001-49772321-vmss000000   Ready    agent   7d1h   v1.28.5   10.26.195.9     <none>        Ubuntu 22.04.4 LTS   5.15.0-1057-azure   containerd://1.7.7-1
aks-pool001-49772321-vmss000001   Ready    agent   7d1h   v1.28.5   10.26.194.207   <none>        Ubuntu 22.04.4 LTS   5.15.0-1057-azure   containerd://1.7.7-1
aks-pool001-49772321-vmss00000b   Ready    agent   7d1h   v1.28.5   10.26.194.33    <none>        Ubuntu 22.04.4 LTS   5.15.0-1057-azure   containerd://1.7.7-1
aks-pool002-37360131-vmss00000h   Ready    agent   7d     v1.28.5   10.26.194.91    <none>        Ubuntu 22.04.4 LTS   5.15.0-1057-azure   containerd://1.7.7-1
aks-pool002-37360131-vmss00000q   Ready    agent   7d     v1.28.5   10.26.194.120   <none>        Ubuntu 22.04.4 LTS   5.15.0-1057-azure   containerd://1.7.7-1
aks-pool002-37360131-vmss00000v   Ready    agent   7d     v1.28.5   10.26.194.149   <none>        Ubuntu 22.04.4 LTS   5.15.0-1057-azure   containerd://1.7.7-1

  • How was the ingress-nginx-controller installed:
    • If helm was used then please show output of helm ls -A | grep -i ingress
$ helm ls -A | grep -i ingress
ingress-nginx                                           ap-system               1               2024-03-21 08:44:29.913568481 +0000 UTC deployed        ingress-nginx-4.10.0  
  • If helm was used then please show output of helm -n <ingresscontrollernamespace> get values <helmreleasename>
$ helm -n ap-system get values ingress-nginx
USER-SUPPLIED VALUES:
controller:
  ingressClassResource:
    name: nginx
  service:
    type: ClusterIP


  • If helm was not used, then copy/paste the complete precise command used to install the controller, along with the flags and options used

  • if you have more than one instance of the ingress-nginx-controller installed in the same cluster, please provide details for all the instances

  • Current State of the controller:

    • kubectl describe ingressclasses
$ kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.10.0
              helm.sh/chart=ingress-nginx-4.10.0
Annotations:  meta.helm.sh/release-name: ingress-nginx
              meta.helm.sh/release-namespace: ap-system
Controller:   k8s.io/ingress-nginx
Events:       <none>


  • kubectl -n <ingresscontrollernamespace> get all -A -o wide

  • kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>

  • kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>

  • Current state of ingress object, if applicable:

    • kubectl -n <appnamespace> get all,ing -o wide
    • kubectl -n <appnamespace> describe ing <ingressname>
    • If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag
  • Others:

    • Any other related information like ;
      • copy/paste of the snippet (if applicable)
      • kubectl describe ... of any custom configmap(s) created and in use
      • Any other related information that may help

How to reproduce this issue:
we have a service that somestimes anwers with 2x Transfer-Encoding: chunked , sometimes with 1x Transfer-Encoding: chunked, when it answers with 2x the header, the request does not reach the client and ingress answers with 502

we hook into the ingress via port-forward to reproduce it, but have no better setup for local reproducing

kubectl port-forward -n ap-system ingress-nginx-controller-8bf5b5f98-w8gbz 8076 8080
  1. <send requests via postman>

  2. observe 502 & log when pod answers with duplicate header Transfer-Encoding: chunked, regular 200 when answering with one header

We have also observed this for other clients with other backends, but so far have only particular endpoint reproduced with "always errors when X and always ok when Y"

Anything else we need to know:

we now have deployed a seperate ingress @ 1.10.0 (with seperate ingress class) and can observe this behaviour while our "hot" ingress that gets traffik is back to 1.9.6, the 1.10.0 still breaks while we have an operational ingress with 1.9.6. It does sound somewhat similar to spring-projects/spring-boot#37646

@DRoppelt DRoppelt added the kind/bug Categorizes issue or PR as related to a bug. label Mar 25, 2024
@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Mar 25, 2024
@longwuyuan
Copy link
Contributor

/remove-kind bug

  • I just tested controller v1.10 and I have used v1.10 since its release and I don't see this problem.
  • You are using service type as ClsuterIP so that is a uncommon use-case. It implies that there is no service type LoadBalancer where the connection from outside the cluster terminates
  • You are referring to headers and HTTP specs like transfer-encoding. While another user of same use-case and experts would comment on this, it does not give actionable detailed information with regards to a bug
  • You have not provided critical details tht are asked in the issue template so don't know if you are using snippets

Its better if you write a step-by-step procedure that someone can copy/paste from. Please use the image httpbun.org as it will get people on the same page. Please use minikube to reproduce the problem. Ensure you provide all manifests that someone can copy/paste and reproduce.

/kind support
/triage needs-information

@k8s-ci-robot k8s-ci-robot added kind/support Categorizes issue or PR as a support question. triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels Mar 25, 2024
@AlexPokatilov
Copy link

@longwuyuan
I have the same problem like @DRoppelt in 1.10.0 release on bare-metal cluster (v1.28.6) with LoadBalancer service type (MetalLB v0.14.3).
On 1.9.6 version works without any problems.

@longwuyuan
Copy link
Contributor

@AlexPokatilov thanks for updating.

  • I it possible to reproduce this on minikube, using a backend created from httpbun.org
  • With a simple ingress for that httpbun.org backend, all my requests are success so I suspect I will need your ingress reported here as output of kubectl describe
  • Also if all the kubectl describe outputs as asked in the template of a new issue are provided, it would have thrown some info on the critical info required to analyze

@DRoppelt
Copy link
Author

Do you happen to have a sample of a ticket where someone reproduced a bug that was based on the server's result? Part of the reproducing-setup would be to have a backend behind ingress to respond a certain way to showcase the setup in a minikube.

I could create a spring-boot app that does that & create an image locally & deploy that, but that sounds like a lot of overhead for someone to reproduce that does not happen to have mvn&jdk locally

@longwuyuan
Copy link
Contributor

Also you will see, there are no logs in the issue description. So essentially the info is already asked in issue template but I will list again (Not YAML but current state of resources shown as kubectl describe output and also the logs of the controller pod)

  • kubectl describe of controller related resources
  • kubectl describe of app resources like service and ingress
  • Actual curl as executed with -v in complete exact detail
  • Actual logs of the controller pod(s)
  • Kubectl get events -A (if related )

@DRoppelt
Copy link
Author

I will get back to you tomorrow and supply missing info. Sorry for that.

I have used the template but omitted some of the parts that seemed less relevant but obviously were more relevant than I initially thought.

@longwuyuan
Copy link
Contributor

@DRoppelt thanks for the idea.

By discussing here or by copy/pasting data, if we can triage the issue description to an aspect that is related to reverseproxy or ingress-api, it would help make progress.

If we make it a maven/jdk/jre/jsp specific discussion, then later in the discussion we will arrive at a step where we pinpoint at what reverseproxy concept or what nginx-as-a-reverseproxy concept we are looking at. In addition to what K8S-Ingress-API related problem are we looking at.

In this case, it seems you are pointing at your java server setting "Transfer-Encoding: chunked". If so, we need to clearly establish of the ingress-controller is doing something to that header in rejecting the response. If so, then the prime suspects are nginx v1.25 which is the upgrade to nginx v1.21 in controller v1.9.x .

I am wondering why should nginx reverseproxy bother what value you set to that header. We don't have any business messing with that. I don't know if you meant you were setting that header in your code and in the springboot framework. Either way, i know for sure that the controller does not get in the way normally. Hence we need to deep dive.

BUT a clear issue description comes first. And the specific test where we alter headers comes next.

@DRoppelt DRoppelt changed the title since chart 1.10.0: ingress sometimes answering 502 on behalve of the service since chart 4.10.0: ingress sometimes answering 502 on behalve of the service Mar 25, 2024
@DRoppelt DRoppelt changed the title since chart 4.10.0: ingress sometimes answering 502 on behalve of the service since controller 1.10.0 (chart 4.10.0): ingress sometimes answering 502 on behalve of the service Mar 25, 2024
@DRoppelt
Copy link
Author

DRoppelt commented Mar 25, 2024

Ok, so since the upgrade to controller 1.10.0 (chart 4.10.0), we observe 502s (not always, but at least one case where we can reproduce it deterministically)

here is an attempt to visualize:

client--(1)->nginx--(2)->backend
client<-(4)--nginx<-(3)--backend

given that

  1. (1) is a POST
  2. (3) has response-header Transfer-Encoding: chunked set twice

then nginx appears to intercept the original response of (3) and answers on behalf of backend with (4) with a response-code 502

We also observe following log with matching timestamps

ingress-nginx-controller-8bf5b5f98-w8gbz 2024/03/22 11:01:54 [error] 575#575: *28680 upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream, client: 127.0.0.1, server: localhost, request: "POST /retrieve HTTP/1.1", upstream: "http://10.26.195.10:8080/retrieve", host: "localhost:8076"
ingress-nginx-controller-8bf5b5f98-w8gbz 127.0.0.1 - - [22/Mar/2024:11:01:54 +0000] "POST /retrieve HTTP/1.1" 502 150 "-" "PostmanRuntime/7.36.3" 3208 2.317 [default-ebilling-retrieval-http] ] 10.26.195.10:8080 0 2.318 502 94bc05d81342c91791fac0f02cb64434

We found that when altering backend to set the header once (as the backend probably should anyways), the response goes through as (4) to the client as 200. The backend setting the header twice is an odd behaviour, but previously went through where now it fails. We have reproduced this in an AKS cluster but not yet fully locally.

resproducing

I tried to setup minikube and get a sample going with a spring boot app, but it appears that minikube comes with controller 1.9.4 and I found no way to pin it to 1.10.0, please advice.

Do you have a sample ticket with httpbun.org I could copy from? I have a hard time understanding how to setup a sample based on that?

@longwuyuan
Copy link
Contributor

longwuyuan commented Mar 25, 2024

  • httpbun image contains a compiled (go) binary so not eas to change it to set Transfer-Encoding: chunked
  • Run your java app in minikube
  • BUT create minikube instance with a VM instead of docker driver
  • Install metallb from metallb.org
  • Set the minikube ip output as the L2 ipaddress pool starting and ending. This will assign the same ip to the ingress-nginx
  • Install ingress-nginx using helm on minikube
  • Make /etc/hosts entry for the minikube ip to point to your app ingress hostname

Personally I am going to just use a nginx:alpine image to create a backend and see if I can just add the header to nginx.conf in that pod

@longwuyuan
Copy link
Contributor

I see this ;

% curl -L httpbun.dev.enjoydevops.com/headers -D t.txt
{
  "Accept": "*/*",
  "Host": "httpbun.dev.enjoydevops.com",
  "User-Agent": "curl/7.81.0",
  "X-Forwarded-For": "192.168.122.1",
  "X-Forwarded-Host": "httpbun.dev.enjoydevops.com",
  "X-Forwarded-Port": "443",
  "X-Forwarded-Proto": "https",
  "X-Forwarded-Scheme": "https",
  "X-Real-Ip": "192.168.122.1",
  "X-Request-Id": "4cff8aa1823b57b21487300b7a6e2505",
  "X-Scheme": "https"
}
[~] 
% less t.txt 
[~] 
% cat t.txt 
HTTP/1.1 308 Permanent Redirect
Date: Mon, 25 Mar 2024 19:21:33 GMT
Content-Type: text/html
Content-Length: 164
Connection: keep-alive
Location: https://httpbun.dev.enjoydevops.com/headers

HTTP/2 200 
date: Mon, 25 Mar 2024 19:21:33 GMT
content-type: application/json
content-length: 388
x-powered-by: httpbun/5025308c3a9df224c10faae403ae888ad5c3ecc5
strict-transport-security: max-age=31536000; includeSubDomains

So it will be interesting to see your curl and same info.

Trying to co-relate if nginx v1.25 does anything different .

Also I read here that HTTP2 does not allow that header and I see HTTP/2 in play in my test above

@longwuyuan
Copy link
Contributor

@DRoppelt
Copy link
Author

DRoppelt commented Mar 25, 2024

It is a bit too late (as in day of time in Europe) for me to get a feel for minikube, nevertheless thank you for the steps, will look at them myself tomorrow.

Meanwhile, here is a backend that can be used for testing.
chunky-demo.zip

@RestController
@Slf4j
public class ChunkyRestController {
//...
    @PostMapping("/chunky/{times}/{addedheader_times}")
    public void streamedFile(@PathVariable Integer times,@PathVariable Integer addedheader_times, HttpServletResponse response) throws IOException, InterruptedException {

        for (int i = 0; i < addedheader_times; i++) {
            log.info("added chunking header for the {}th time", i);
            response.addHeader("Transfer-Encoding", "chunked");
        }

        PrintWriter writer = response.getWriter();

        for (int i = 0; i < times; i++) {
            writer.println(i);
            writer.flush();
            log.info("flushed, waiting before flushing again {}/{}", i, times+1);
            int waitIntervalMs = 100;
            Thread.sleep(waitIntervalMs);
        }

    }

}
$ podman build --tag chunky-demo -f Dockerfile && podman run -p 8080:8080 chunky-demo
...

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::                (v3.2.4)

2024-03-25T20:13:27.691Z  INFO 1 --- [demo] [           main] com.example.demo.Application             : Starting Application v0.0.1-SNAPSHOT using Java 21.0.2 with PID 1 (/app/app.jar started by root in /app)
2024-03-25T20:13:27.698Z  INFO 1 --- [demo] [           main] com.example.demo.Application             : No active profile set, falling back to 1 default profile: "default"
2024-03-25T20:13:29.230Z  INFO 1 --- [demo] [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port 8080 (http)
2024-03-25T20:13:29.246Z  INFO 1 --- [demo] [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2024-03-25T20:13:29.246Z  INFO 1 --- [demo] [           main] o.apache.catalina.core.StandardEngine    : Starting Servlet engine: [Apache Tomcat/10.1.19]
2024-03-25T20:13:29.296Z  INFO 1 --- [demo] [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2024-03-25T20:13:29.297Z  INFO 1 --- [demo] [           main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 1469 ms
2024-03-25T20:13:29.715Z  INFO 1 --- [demo] [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port 8080 (http) with context path ''
2024-03-25T20:13:29.742Z  INFO 1 --- [demo] [           main] com.example.demo.Application             : Started Application in 2.634 seconds (process running for 3.243)

curl -X POST localhost:8080/chunky/$chunktimes/$addChunkheaderTimes -v

the api has 2 parameters, it will stream a response in 100ms intervals for $chunktimes and set additional $addChunkheaderTimes headers (one comes from the embedded tomcat itself)

I hope this helps!

➜  ~ curl -X POST localhost:8080/chunky/10/1 -v
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> POST /chunky/10/1 HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.0.1
> Accept: */*
>
< HTTP/1.1 200
< Transfer-Encoding: chunked
< Transfer-Encoding: chunked
< Date: Mon, 25 Mar 2024 20:07:27 GMT
<
0
1
2
3
4
5
6
7
8
9
* Connection #0 to host localhost left intact

@longwuyuan
Copy link
Contributor

longwuyuan commented Mar 26, 2024

@DRoppelt thank you. It really helped.

@longwuyuan
Copy link
Contributor

longwuyuan commented Mar 26, 2024

/kind bug
/remove-kind support
/triage accepted

@tao12345666333 @rikatz @cpanato @strongjz because of the bump to nginx v1.25.x, chunking related behaviour of the controller is broken out of the box. Details above.

Basically internet search says a directive like "chunked_transfer_encoding off;" has to be set, on the controller, if a backend upstream app is setting "Transfer-Encoding: chunked".

Right now, the controller is rejecting response from a backend upstream app with the log message below ;

2024/03/26 03:39:34 [error] 267#267: *358731 upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream, client: 192.168.122.1, server: chunky-demo.dev.enjoydevops.com, request: "POST /chunky/10/1 HTTP/1.1", upstream: "http://10.244.0.68:8080/chunky/10/1", host: "chunky-demo.dev.enjoydevops.com"
192.168.122.1 - - [26/Mar/2024:03:39:34 +0000] "POST /chunky/10/1 HTTP/1.1" 502 150 "-" "curl/7.81.0" 107 0.006 [default-chunky-demo-8080] [] 10.244.0.68:8080 0 0.006 502 9d060a48534b2dcb3b18c598a5c1ee06
2024/03/26 03:40:04 [error] 267#267: *358980 upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream, client: 192.168.122.1, server: chunky-demo.dev.enjoydevops.com, request: "POST /chunky/10/1 HTTP/1.1", upstream: "http://10.244.0.68:8080/chunky/10/1", host: "chunky-demo.dev.enjoydevops.com"
192.168.122.1 - - [26/Mar/2024:03:40:04 +0000] "POST /chunky/10/1 HTTP/1.1" 502 150 "-" "curl/7.81.0" 107 0.007 [default-chunky-demo-8080] [] 10.244.0.68:8080 0 0.007 502 91f6129d03e71241cf37446190a46d2b

  • The reproduce steps are provided in one of the messages above. I built that image locally and did docker save image to tar , then minikube image load into my minikube

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 26, 2024
@longwuyuan
Copy link
Contributor

Seems related #4838

@longwuyuan
Copy link
Contributor

@DRoppelt I think I have understood the problem and the solution

  • I removed the header setting code from your example java app
% cat src/main/java/com/example/demo/ChunkyRestController.java                               
package com.example.demo;

import jakarta.servlet.http.HttpServletResponse;
import lombok.extern.slf4j.Slf4j;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.io.PrintWriter;

@RestController
@Slf4j
public class ChunkyRestController {

    public record HelloWord(String message) {
    }
    @GetMapping("/")
    public HelloWord showVetList() {

        return new HelloWord("hi");

    }

    @PostMapping("/chunky/{times}/{addedheader_times}")
    public void streamedFile(@PathVariable Integer times,@PathVariable Integer addedheader_times, HttpServletResponse response) throws IOException, InterruptedException {

        for (int i = 0; i < addedheader_times; i++) {
            log.info("added chunking header for the {}th time", i);
        }

        PrintWriter writer = response.getWriter();

        for (int i = 0; i < times; i++) {
            writer.println(i);
            writer.flush();
            log.info("flushed, waiting before flushing again {}/{}", i, times+1);
            int waitIntervalMs = 100;
            Thread.sleep(waitIntervalMs);
        }

    }

}

  • I deployed the modified edition of your example app
  • The POST method request was a success
% curl -X POST chunkynoheader.dev.enjoydevops.com/chunky/10/1 -v
*   Trying 192.168.122.193:80...
* Connected to chunkynoheader.dev.enjoydevops.com (192.168.122.193) port 80 (#0)
> POST /chunky/10/1 HTTP/1.1
> Host: chunkynoheader.dev.enjoydevops.com
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 
< Date: Tue, 26 Mar 2024 05:46:55 GMT
< Transfer-Encoding: chunked
< Connection: keep-alive
< 
0
1
2
3
4
5
6
7
8
9
* Connection #0 to host chunkynoheader.dev.enjoydevops.com left intact

@longwuyuan
Copy link
Contributor

I think we need a docs PR to be clear about this to users of controller v1.10.x+

@tatobi
Copy link

tatobi commented Mar 27, 2024

@DRoppelt thanks. Sure, here it is below.
Verified, our Java based service running in the POD respond with Transfer-Encoding: chunked despite the response just a few character.

> Host: **********.*.svc.cluster.local:8080
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200
< Vary: Origin
< Vary: Access-Control-Request-Method
< Vary: Access-Control-Request-Headers
< Transfer-Encoding: chunked
< Date: Wed, 27 Mar 2024 12:**:** GMT
< Keep-Alive: timeout=60
< Connection: keep-alive
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 0
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Pragma: no-cache
< Expires: 0
< X-Frame-Options: DENY
< Content-Type: application/json
< Transfer-Encoding: chunked

@DRoppelt
Copy link
Author

< Transfer-Encoding: chunked
...
< Transfer-Encoding: chunked

appears to be the same issue that we were able to reproduce with longwuyuan. Duplicate transfere-encoding header. The ingress pod should also show a time-wise-matching error-log entry. In our services, we had tomcat/boot setting the header & application also setting this header in addition.

I looked into this initially

Other people on internet, facing the same prolem, after upgrading to nginx v1.24+, resolved this by setting "chunked_transfer_encoding off;", in their nginx.conf

but since I have found no corresponding ConfigMap entry (https://docs.nginx.com/nginx-ingress-controller/configuration/global-configuration/configmap-resource/), opted to fix the affected services instead of working with config-snippets.

I am curious to see how this bug continues, but opted to fix that particular backend either way.

@tatobi
Copy link

tatobi commented Mar 27, 2024

@DRoppelt thanks. I see the duplication.
IMHO the new nginx version should tolerate it as old ones did. Generally header duplications shouldn't be handled as errors, max a warning.
The questions in that case:

  • RFC2616 states that header duplications should be accepted. Logically, if your request goes through multiple proxies, multiservice stack, so it shouldn't be handled as an error.
  • why older nginx versions tolerate header duplications without any notice and new ones not?
  • looking for where is stated that new nginx versions will fail (502) in case of upstream header duplication, cannot find it.
  • whether new versions will be fixed in the future or we have to fix one-by-one in our deployments by adding chunked_transfer_encoding off to configmaps ?

Thanks!

@DRoppelt
Copy link
Author

I agree on all terms and thanks for bringing up the RFC.

https://datatracker.ietf.org/doc/html/rfc2616#section-4.2

   Multiple message-header fields with the same field-name MAY be
   present in a message if and only if the entire field-value for that
   header field is defined as a comma-separated list [i.e., #(values)].
   It MUST be possible to combine the multiple header fields into one
   "field-name: field-value" pair, without changing the semantics of the
   message

I would like a maintainer to chime in here and get a different perspective. The link that @longwuyuan shared (https://nginx.org/en/docs/faq/chunked_encoding_from_backend.html) appears to to deal about something nginx before 1.1.4

regarding:

why older nginx versions tolerate header duplications without any notice and new ones not?

the flipped default since 1.24, it seems

#11162 (comment) :

The chunking is on by default in nginx v1.24+ https://nginx.org/en/docs/http/ngx_http_core_module.html#chunked_transfer_encoding . So you don't need to set the header in your app, if you are using nginx v1.24+ anywhere near your app

@longwuyuan
Copy link
Contributor

longwuyuan commented Mar 28, 2024

My take on summary so far ;

  • There are 2 problems. Duplicacy & also even setting chunking header in code

  • If you put nginx v1.21 inside your pod, in front of Tomcat, and set chunking ON in nginx, we can test duplicacy

  • I think setting chunking on repeatedly in one single connection in nginx v1.24+, is not allowed, but your app is doing that. But I could be completely talking crap here (sorry if its that)

  • I am trying to do a simple test with vanilla nginx versions, but Tomcat drops connection early

  • If I remove the header setting code in java app, then request is success

  • So it means setting chunking ON repeatedly in one single connection was tolerated in older nginx but not anymore since nginx v1.24+ because chunking is ON by default

  • If you can test your header setting code, with vanilla non-K8S Nginx (both nginx versions v1.21 and v1.25), it will help make progress here, because project can take action based on knowing what changed for vanilla Nginx, between v1.21 and v1.24+

  • I am trying docker compose to test, but normal connections works and chunking request time out after 9 loops in code. I added these file to what @DRoppelt provided

compose.yaml

services:
  app:
    image: chunky-vanilanginx-app
    build:
      dockerfile: Dockerfile
    ports:
      - 8080:8080
  web:
    image: chunky-vanilanginx-web
    build:
      dockerfile: nginx.Dockerfile
    ports:
      - 80:80

nginx.Dockerfile

FROM nginx:alpine

COPY nginx.default.conf /etc/nginx/conf.d/default.conf

CMD ["nginx","-g","daemon off;"]

EXPOSE 80

nginx.default.conf

server {
    listen       80;
    server_name  localhost;


    location / {
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
        proxy_connect_timeout 75s;
        proxy_buffer_size 8k;
        proxy_buffering on;
        proxy_buffers 8 8k;
        proxy_busy_buffers_size 16k;

        proxy_pass http://app:8080;
        #root   /usr/share/nginx/html;
        #index  index.html index.htm;
    }

    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }

}
  • Then I use docker compose up --build -d
  • Then I curl localhost/chunky/10/1 -X POST -v
  • Then I check docker compose logs repeatedly to see the error
  • And to check the older nginx behaviour with same code, i just copy everything to a different folder and use the below changed Dockerfile so that older nginx is in use

nginx.Dockerfile

FROM nginx:1.21-alpine

COPY nginx.default.conf /etc/nginx/conf.d/default.conf

EXPOSE 80

CMD ["nginx","-g","daemon off;"]

@DRoppelt
Copy link
Author

DRoppelt commented Mar 28, 2024

I kept mentioning other services that do not explicitly deal with this header, I now found how they also face nginx error-ing and returning 502 on their behalf. It is not a different header or such. Failing with the same error but with a different use-case causing that.

The backends call further downstream systems and return their response 1:1 upstream, in a way acting as a proxy themselves

client--(1)->nginx--(2)->backendA--(3)->backendB
client<-(6)--nginx<-(5)--backendA<-(4)--backendB

A calls B (3) and receive Transfer-Encoding: chunked in (4).
They then hand over (4) 1:1 and tomcat adds another pair Transfer-Encoding: chunked in (5) which nginx then rejects.

background for this setup: backendA takes the request (2) and is doing some request-transformation, looking up values in a DB and creates a new request (3), but then just shovels over (4) back to the client (5) without any transformation

Which is a somewhat odd behaviour (especially taking all returned headers in 4 and pumping them through to (5), but a somewhat reasonable usecase.

This here specifies Transfer-Enconding a "hop-by-hop" header https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding

Transfer-Encoding is a hop-by-hop header, that is applied to a message between two nodes, not to a resource itself. Each segment of a multi-node connection can use different Transfer-Encoding values. If you want to compress data over the whole connection, use the end-to-end Content-Encoding header instead.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers#hop-by-hop_headers

These headers are meaningful only for a single transport-level connection, and must not be re-transmitted by proxies or cached. Note that only hop-by-hop headers may be set using the Connection header.

Which gives this whole thing a bit of a "twist" for the nginx, why it should even care about that header to the point it rejects the request.

Anyways, all nginx sources/questions seem to point to "just set chunked_transfer_encoding off in your nginx.conf"

Proposals:

a) create a 1.10.1 that statically sets this flag to create a forward compatible release

b) create a chart that allows setting this value to "off" via values.yaml (but leave the default on "on" how the nginx project intended)

c) in addition to b), add an option to control this setting via annotation on a per-ingress-config basis

d) address the rejection of the request as a bug at nginx?

@longwuyuan
Copy link
Contributor

@longwuyuan
Copy link
Contributor

I have new information ;

  • I created a image with your app code
  • I commented the line from your app code for setting the header

image

  • I added nginx v1.25 in the pod
    image

  • Now the image has both nginx listening on 80 and proxy_passing to localhost(tomcat) at port 8080 (although enrypoint did not work as in screenshot above, i started nginx manually)
    image

  • So now both nginx and tomcat are setting the header but java code is not setting the header

  • My request was a success
    image

  • This means tomcat and also nginx-inside-pod, both are setting the chunking header

  • So I have to suspect that java code setting header was violation of HTTP as per https://nginx.org/en/docs/faq/chunked_encoding_from_backend.html BUT nginx v1.21 was not doing a strict enforcing.

  • So I am not certain if introducing instrumentation to the controller to turn chunking OFF is a improvement

  • Let me know what you think

  • In fact adding nginx in front of your tomcat, inside the pod, is the standard practice, because tomcat's listener is for dev traffic, and not for production traffic

@DRoppelt
Copy link
Author

DRoppelt commented Mar 29, 2024

I believe https://nginx.org/en/docs/faq/chunked_encoding_from_backend.html to be misleading for this bug. It talks specifically about miss-represented payload

and instead of pure JSON from backend, nginx returns something framed in decimal numbers like

and nginx 1.1.14 to "fix" it

nginx version 1.1.4 and newer, where an additional code was introduced to handle such erratic backend behavior

Which also is a version from 2011 (https://nginx.org/en/CHANGES-1.2) and I do not think this particular issue is relevant here.

because tomcat's listener is for dev traffic, and not for production traffic

in case of a Spring Boot application this is not true. Tomcat is an integral part of the deployment. it may be replaced by netty but a webserver of some sort must be part of the JVM and cannot be swapped with nginx. In k8s, nginx serves as an ingress to the pod and (at least for our usecase) terminates TLS and routes the traffic to pods. The tomcat takes the request and handles the translation from http to end up into "java code", somewhat simplified. The image than can be built from my provided sample would be promoted from dev all the way to prod and only be altered via configmaps.

ingress-nginx shows a breaking behaviour from controller 1.9.6 to 1.10.0 and I think this needs to be addressed. Either to disable the chunked_transfer_encoding or let users opt-out from it via values and/or annotations. We can argue if/how the backends need to change but IMO needs a migration-path this is better than "stick to 1.9.X until backends do not do this anymore"

@DRoppelt DRoppelt changed the title since controller 1.10.0 (chart 4.10.0): ingress sometimes answering 502 on behalve of the service since controller 1.10.0 (chart 4.10.0): ingress rejects duplicate "Transfer-Encoding: chunked" and returns 502 Mar 29, 2024
@longwuyuan
Copy link
Contributor

@DRoppelt I agree that stick to v1.9.6 is not right.

  • The project never does or never will think on the lines of stick to "something"
  • The ingress-api specs of K8S determine almost all decisions
  • The annotations are components to integrate reverseproxy features with ingress-api
  • So in this case chunking is the reverseproxy feature
  • Project upgraded nginx as project is depending on such components to stay compliant
  • Now this dependency of nginx (as part of openresty) changed behaviour

Lets get on a call and discuss this.
I need to gather some clear info ;

  1. Is the duplicacy coming from your java code for loop !
  2. Is the duplicacy coming from header set in code and header set in tomcat ! I showed you that I set header in nginx as well as Tomcat in the pod but there was no duplicacy error.
  3. I remove the header in java code and error goes away along with chunking working.
  4. So why should the ingress-controller risk breaking away from the ecosystem ?
  5. Why should developer time be spent creating annotationss to change the default, when a user, who wants to set over-riding headers in code, can use snippets https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#server-snippet , and set chunking off ?

@longwuyuan
Copy link
Contributor

@DRoppelt will look forward to know if we can get on a call on zoom sometime.

  • I want to get on the same page with you in terms of the message about duplicate setting of header.
    • The reason is that I created a image with your app
    • But I added nginx v1.25 into that image
    • I expect that adding nginx in your image in front of tomcat sets the chinking header one additional time
    • So now the chunking is on in 3 places
      • chunking is set in code
      • chunking is set in tomcat
      • chunking is set in nginx
    • then I removed chunking setting line in the code
    • so I expect chunking is now set 2 times in the image
    • BUT my request was a success and I did not get the duplicate header error

Ideally you don't need to change code in your git repo. I am hoping that you do this same test I describe above. Then please give your opinion if this test is valid for checking the "2 times setting header" . I wish we could get a nginx expert to comment on this

@rikatz
Copy link
Contributor

rikatz commented Mar 30, 2024

Question and point here is: this seems to me an NGINX ( and not ingress controller code) breaking change to me.

I'm not sure how many users are impacted, or as Long pointed this is specific for a situation on Tomcat/Jetty servers.

My proposal is, while we don't default to the previous configuration (because there may be a reason NGINX turned that on, or broke it) we can allow this to be configurable. People migrating to ingress v1.10 can opt in to disabling chunking.

I just don't know if this should be a global admin defined config (configmap) or a user defined config (annotation)

Is this approach enough?

@longwuyuan
Copy link
Contributor

longwuyuan commented Mar 30, 2024

COMMENT

@rikatz yes, this approach seems enough for @DRoppelt as that is what he asked here #11162 (comment) .

CAVEAT

  • I recall testing his app image by removing the line from his java code that sets the header in a for loop
  • I created a new image by adding nginx v1.25 in front of his spring-boot and proxy_pass set to his spring_boot
  • I don't recall testing yet with setting https://nginx.org/en/docs/http/ngx_http_core_module.html#chunked_transfer_encoding OFF, using server-snippet. I will do that asap and update. This test should provide data to confirm/deny if proposed approach is feasible . Just saying because I am not sure if this test was done already

UNCLEAR INFO

  • The error message is this

upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream

  • B U T, if I put Nginx v1.25, that has "chunked_transfer_encoding: on", in front of springboot-app/Tomcat, which also has chunking setting ON, that is a test case of setting chunking ON 2 times, for ingress-nginx-controller's upstream. Y E T, there is no error and request is a success with chunking working.

  • The error occurs only when the java code is setting chunking in a for loop as seen below ;

@RestController
@Slf4j
public class ChunkyRestController {
//...
    @PostMapping("/chunky/{times}/{addedheader_times}")
    public void streamedFile(@PathVariable Integer times,@PathVariable Integer addedheader_times, HttpServletResponse response) throws IOException, InterruptedException {

        for (int i = 0; i < addedheader_times; i++) {
            log.info("added chunking header for the {}th time", i);
            response.addHeader("Transfer-Encoding", "chunked");    //  <--C H U N K I N G   S E T  H E R E
        }

        PrintWriter writer = response.getWriter();

        for (int i = 0; i < times; i++) {
            writer.println(i);
            writer.flush();
            log.info("flushed, waiting before flushing again {}/{}", i, times+1);
            int waitIntervalMs = 100;
            Thread.sleep(waitIntervalMs);
        }

    }

}

@longwuyuan
Copy link
Contributor

longwuyuan commented Mar 30, 2024

Test-Case 1

  • springboot app sets chunking header
        for (int i = 0; i < addedheader_times; i++) {
            log.info("added chunking header for the {}th time", i);
            response.addHeader("Transfer-Encoding", "chunked");    //  <--C H U N K I N G   S E T  H E R E
        }
  • but controller is set with chunked_transfer_encoding OFF (using configuration-snippet)
% k -n ingress-nginx exec ingress-nginx-controller-6bfb976cf6-hj6j2 -- cat /etc/nginx/nginx.conf | grep -i chunked -A9                                                                                                    
                                   chunked_transfer_encoding off;                                                                                                                                                                    
                                                                                                                                                                                                                                     
                                   proxy_pass http://upstream_balancer;                                                                                                                                                              
                                                                                                                                                                                                                                     
                                   proxy_redirect                          off;                                                                                                                                                      
                                                                                                                                                                                                                                     
                           }                                                                                                                                                                                                         
                                                                                                                                                                                                                                     
                   }                                                                                                                                                                                                                 
                   ## end server chunky.mydomain.com                                                                                                                                                                                 
           [~/Documents/ingress-issues/11162]                                                                                                                                                                                        
           % k describe ing chunky                                                                                                                                                                                                   
           Name:             chunky                                                                                                                                                                                                  
           Labels:           <none>                                                                                                                                                                                                  
           Namespace:        default                                                                                                                                                                                                 
           Address:          192.168.122.90                                                                                                                                                                                          
           Ingress Class:    nginx                                                                                                                                                                                                   
           Default backend:  <default>                                                                                                                                                                                               
           Rules:                                                                                                                                                                                                                    
             Host                 Path  Backends                                                                                                                                                                                     
             ----                 ----  --------                                                                                                                                                                                     
             chunky.mydomain.com                                                                                                                                                                                                     
                                  /   chunky:8080 (10.244.0.14:8080)                                                                                                                                                                 
           Annotations:           nginx.ingress.kubernetes.io/configuration-snippet: chunked_transfer_encoding off;                                                                                                                  
           Events:                                                                                                                                                                                                                   
             Type    Reason  Age                 From                      Message                                                                                                                                                   
             ----    ------  ----                ----                      -------                                                                                                                                                   
             Normal  Sync    8m9s (x4 over 30m)  nginx-ingress-controller  Scheduled for sync                                                                                                                                        
           [~/Documents/ingress-issues/11162]                                                                                                                                                                                        
           %                                                                                  
  • Result is same error
% curl --resolve chunky.mydomain.com:80:`minikube ip` -X POST chunky.mydomain.com/chunky/10/1
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx</center>
</body>
</html>

  • Controller logs below
2024/03/30 18:21:21 [error] 405#405: *47660 upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream, client: 192.168.122.1, server: chunky.mydomain.com, request: "POST /chunky/10/1 HTTP/1.1", upstream: "http://10.244.0.14:8080/chunky/10/1", host: "chunky.mydomain.com"
192.168.122.1 - - [30/Mar/2024:18:21:21 +0000] "POST /chunky/10/1 HTTP/1.1" 502 150 "-" "curl/8.2.1" 94 0.003 [default-chunky-8080] [] 10.244.0.14:8080 0 0.003 502 b9058637e160edcb392b30a6dab624e6
192.168.122.1 - - [30/Mar/2024:18:28:10 +0000] "POST /api/user/auth-tokens/rotate HTTP/2.0" 200 2 "https://grafana.dev.enjoydevops.com/d/nginx/nginx-ingress-controller?orgId=1&refresh=5s" "Mozilla/5.0 (X11; Linux x86_64; rv:124.0) Gecko/20100101 Firefox/124.0" 379 0.020 [observability-prometheusgrafana-80] [] 10.244.0.7:3000 2 0.019 200 0aa4648602d2e50e45b4b233b52cff63

I am going to test setting chunking off in the controller, using server-snippets and update in next post

@longwuyuan
Copy link
Contributor

Test-Case 2

  • springboot app sets chunking header
        for (int i = 0; i < addedheader_times; i++) {
            log.info("added chunking header for the {}th time", i);
            response.addHeader("Transfer-Encoding", "chunked");    //  <--C H U N K I N G   S E T  H E R E
        }
  • but controller is set with chunked_transfer_encoding OFF (using server-snippet)
% k -n ingress-nginx exec ingress-nginx-controller-6bfb976cf6-hj6j2 -- cat /etc/nginx/nginx.conf | grep -i chunked -B16 
                server_name chunky.mydomain.com ;

                http2 on;

                listen 80  ;
                listen [::]:80  ;
                listen 443  ssl;
                listen [::]:443  ssl;

                set $proxy_upstream_name "-";

                ssl_certificate_by_lua_block {
                        certificate.call()
                }

                # Custom code snippet configured for host chunky.mydomain.com
                chunked_transfer_encoding off;

% k describe ing chunky 
Name:             chunky
Labels:           <none>
Namespace:        default
Address:          192.168.122.90
Ingress Class:    nginx
Default backend:  <default>
Rules:
  Host                 Path  Backends
  ----                 ----  --------
  chunky.mydomain.com  
                       /   chunky:8080 (10.244.0.14:8080)
Annotations:           nginx.ingress.kubernetes.io/server-snippet: chunked_transfer_encoding off;
Events:
  Type    Reason  Age                  From                      Message
  ----    ------  ----                 ----                      -------
  Normal  Sync    8m26s (x5 over 47m)  nginx-ingress-controller  Scheduled for sync
  • Result is same error
% curl --resolve chunky.mydomain.com:80:`minikube ip` -X POST chunky.mydomain.com/chunky/10/1
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx</center>
</body>
</html>
  • Controller logs for the error below
2024/03/30 18:44:58 [error] 518#518: *59347 upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream, client: 192.168.122.1, server: chunky.mydomain.com, request: "POST /chunky/10/1 HTTP/1.1", upstream: "http://10.244.0.14:8080/chunky/10/1", host: "chunky.mydomain.com"
192.168.122.1 - - [30/Mar/2024:18:44:58 +0000] "POST /chunky/10/1 HTTP/1.1" 502 150 "-" "curl/8.2.1" 94 0.008 [default-chunky-8080] [] 10.244.0.14:8080 0 0.008 502 bb5874503dde3c474588dab61cf79858

@longwuyuan
Copy link
Contributor

Unclear-info is NOW cleared, after the tests shown in previous 2 posts

  • The error message is

upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream

  • Even after setting the controller, with nginx.conf directive "chunked_transfer_encoding", to "OFF", the problem is not solved. The same error message is seen in the logs

  • @DRoppelt @rikatz I will wait for comments to know if my tests are wrong or need to be changed in any way or if the test and the results are valid

@longwuyuan
Copy link
Contributor

Older Contoller Works

% k describe ingress chunky                                                               
Name:             chunky
Labels:           <none>
Namespace:        default
Address:          192.168.122.90
Ingress Class:    nginx
Default backend:  <default>
Rules:
  Host                 Path  Backends
  ----                 ----  --------
  chunky.mydomain.com  
                       /   chunky:8080 (10.244.0.14:8080)
Annotations:           <none>
Events:
  Type    Reason  Age                  From                      Message
  ----    ------  ----                 ----                      -------
  Normal  Sync    39m (x5 over 78m)    nginx-ingress-controller  Scheduled for sync
  Normal  Sync    24s (x4 over 9m49s)  nginx-ingress-controller  Scheduled for sync
[~/Documents/ingress-issues/11162] 
% curl --resolve chunky.mydomain.com:80:`minikube ip` -X POST chunky.mydomain.com/chunky/10/1    
0
1
2
3
4
5
6
7
8
9
[~/Documents/ingress-issues/11162] 
% k -n ingress-nginx logs ingress-nginx-controller-6d675776b5-2fp84 | tail -3
I0330 19:14:01.138869       7 event.go:298] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"default", Name:"chunky", UID:"2e7bf334-2360-436c-b218-be3645482049", APIVersion:"networking.k8s.io/v1", ResourceVersion:"15081", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
192.168.122.1 - - [30/Mar/2024:19:14:07 +0000] "POST /chunky/10/1 HTTP/1.1" 200 75 "-" "curl/8.2.1" 94 1.021 [default-chunky-8080] [] 10.244.0.14:8080 75 1.021 200 be164e1df55a09393506c10d52f48f7f
192.168.122.1 - - [30/Mar/2024:19:14:36 +0000] "POST /chunky/10/1 HTTP/1.1" 200 75 "-" "curl/8.2.1" 94 1.017 [default-chunky-8080] [] 10.244.0.14:8080 75 1.017 200 f199f5718e8fb244bdefddef2a4b1ed0
[~/Documents/ingress-issues/11162] 
% helm ls -A | grep -i ingress
ingress-nginx           ingress-nginx   2               2024-03-31 00:34:08.754150759 +0530 IST deployed        ingress-nginx-4.9.1             1.9.6

@mcharriere
Copy link

Hi all, to me it seems the issue is caused by this change introduced in NGINX v1.23. (Src https://forum.nginx.org/read.php?2,297086,297090#msg-297090)

This seems to be the expected behavior when the Transfer-Encoding header is duplicated in the upstream response.

@longwuyuan
Copy link
Contributor

@mcharriere thank you very much for that info. Solves the mystery.

@DRoppelt
Copy link
Author

DRoppelt commented Apr 3, 2024

Cool that is some helpful background!

I was not able to come around sooner for the points you brought up @longwuyuan , but here we go now

The project never does or never will think on the lines of stick to "something"

Sorry about my tone that day. I felt frustrated at the time as I am currently dealing with an k8s setup with about 100 services where about 3 services are knowingly affected and an unknown amount may be affected but not yet identified. On the other hand, I am a paid person that can find a way to deal with it in my paid time and should not roll off all onto an open-source project.

I will submit another testcase to show a behaviour that is rather hidden and unluckily has spread as a pattern in our code-base at across various services. It is hard to identify them and the "all or nothing" approach that nginx has (no chunking enabled, OK, chunking enabled hard reject with 502)

Why should developer time be spent creating annotationss to change the default, when a user, who wants to set over-riding headers in code, can use snippets https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#server-snippet , and set chunking off ?

I was thinking about it a bit more and in essence the problem lays in this "all or nothing" approach from nginx without a way inbeween

  1. chunked_transfer_encoding: off all looks good
  2. chunked_transfer_encoding: on hard reject with error message

It would be rather helpful to have a warn-mode that allows a soft mirgation for all identified services that do not stick to the expected behaviour nginx expects.

Especially the part where chunked_transfer_encoding: on and a single Transfer-Encoding: chunked is accepted, but another one is a hard "NO NOT YOU" from nginx.

I am not knowedgeable about the process of submitting a process or a bug at the nginx project. But this stance here last year seems very clear and without any room for wiggle

https://forum.nginx.org/read.php?2,297086,297090#msg-297090

... as the upstream response in question is obviously invalid and should never be accepted in the first place, and the change is more or less a minor cleanup work.

as for the test-cases: you can take the code unmodified and test

a) POST /chunky/10/0 -> regular backend that may add a header (here the tomcat doing it), passes through nginx either way

b) POST /chunky/10/1 -> duplicate header `Transfer-Encoding: chunked , gets rejected when using controller 1.10.0

c) POST /chunky/10/1. Using controller 1.10.0

service has at least the following annotation added and passes due to overriding chunked_transfer_encoding to off

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/server-snippet: |
      chunked_transfer_encoding off;

d) given that the chart has some config to set chunked_transfer_encoding: off, both POST /chunky/10/0 and POST /chunky/10/1 work

can use snippets https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#server-snippet , and set chunking off

at this point I am a bit undecided if this is just enough for this non-standard behavior. A reason against this is that the default of controller.allowSnippetAnnotations is false due to security reasons and would require the controller to first enable it before allowing single-services to opt-out of chunked_transfer_encoding

One can add a global snippet-annotation which allows one to opt-out of chunking. So only a combination of allowing a service to opt-in/-out and controller in combination would add value. Otherwise just disabling it on the controller globally via a snippet is already sufficient. Currently looking into the docs for the exact syntax

I will post another update on the demo-service that show-cases another source of duplicate headers that is not as straight-foward as response.addHeader("Transfer-Encoding", "chunked");

@DRoppelt
Copy link
Author

DRoppelt commented Apr 3, 2024

So here is a sample where the code is proxying another rest-endpoint and therefore gets a 2. header through the way it proxies. I added a feign-based client that proxies a call to another service

For demo-purposes, the service hosts that endpoint too, but does not alter the demo

@FeignClient(name = "proxyingFeignClient", url = "${app.proxyingFeignClient.url}")
public interface ProxyingClient {


    @RequestMapping(method = RequestMethod.GET, value = "/userById/{id}")
    ResponseEntity<User> getUser(@PathVariable("id") Long id);
}

the backend has an endpoint where it does a DB lookup and then does a downstream call. But as an attempt to proxy the call back to upstream, does not re-wrap the response but just returns the downstream call 1:1

@GetMapping("/user/{name}")
    public ResponseEntity<User> proxied(@PathVariable String name, @RequestParam(defaultValue = "false") Boolean rewrap) {

        // pretend that this is a DB lookup
        long id = switch (name.toLowerCase()) {
            case "droppelt" -> 1L;
            case "longwuyuan" -> 2L;
            default -> throw new IllegalArgumentException("unknown user");
        };


        var internalCallResponse = client.getUser(id);

        // for demo-ing purpose, see what we have as headers
        log.info("returned headers {}", internalCallResponse.getHeaders());

        if (!rewrap) {
            // this forces a 2. "Transfer-Encoding: chunked" header as the response has it & tomcat will add another
            return internalCallResponse;
        }

        // here we should not return the ResponseEntity 1:1, but if at all, rewrap it
        return new ResponseEntity<>(internalCallResponse.getBody(), internalCallResponse.getStatusCode());

    }
GET http://localhost:8080/user/DRoppelt

HTTP/1.1 200 
connection: keep-alive, keep-alive
date: Wed, 03 Apr 2024 20:01:48 GMT
keep-alive: timeout=60
transfer-encoding: chunked
Content-Type: application/json
Transfer-Encoding: chunked

{
  "id": 1
}

-> nginx will 502 this request.

but when it actually re-wraps the downstream's response, the duplicate header is prevented

GET http://localhost:8080/user/DRoppelt?rewrap=true

HTTP/1.1 200 
Content-Type: application/json
Transfer-Encoding: chunked
Date: Wed, 03 Apr 2024 20:02:08 GMT
Keep-Alive: timeout=60
Connection: keep-alive

{
  "id": 1
}

This demo also still has the /chunky endpoint

POST http://localhost:8080/chunky/3/0

HTTP/1.1 200 
Transfer-Encoding: chunked
Date: Wed, 03 Apr 2024 20:03:39 GMT
Keep-Alive: timeout=60
Connection: keep-alive

0
1
2

-> header once

POST http://localhost:8080/chunky/3/1

HTTP/1.1 200 
Transfer-Encoding: chunked
Transfer-Encoding: chunked
Date: Wed, 03 Apr 2024 20:04:11 GMT
Keep-Alive: timeout=60
Connection: keep-alive

0
1
2

-> Transfer-Encoding: chunked header twice

chunky-demo-with-proxy.zip

@DRoppelt
Copy link
Author

DRoppelt commented Apr 3, 2024

the endpoint @GetMapping("/user/{name}") is to show how easy or rather innocent the code looks that leads to the hard rejection on nginx side with 502.

I think on our end, we will try to upgrade from nginx 1.9.6 to 1.10.0 with an added server-snippet on the controller that disable chunking transfer. Then we will, once fully rolled out, re-enabled chunking on testing environments while leaving it inactive on productive environments (and with that have a drift between prod vs test for some time) and slowly fix affected services.

From my end, I think we could live without further adjustments except for

a) a note/hightlight in the changelog for this breaking behaviour

b) a sample of an added server-snipped in the controller's ConfigMap

@longwuyuan
Copy link
Contributor

@DRoppelt I think its not been clear enough communication

  • Please look at my post above since controller 1.10.0 (chart 4.10.0): ingress rejects duplicate "Transfer-Encoding: chunked" and returns 502 #11162 (comment)

  • I tested setting chunking off in ingress-nginx controllver v1.10 but HTTP request still failed with same duplicate-header error

  • I set the directive to off ("chunked_transfer_encoding off;") via configuration-snippet annotation and also via server-snippet annotation BUT still got the duplicate header error

  • I will try to put it in docs that there is this change but we will not be publishing any sample to turn that directive off because the duplicate header error is persisting even after setting the directive value to off

@DRoppelt
Copy link
Author

DRoppelt commented Apr 4, 2024

Ah now I see. Ok, I tried to reproduce this on my end deployed in an AKS

  • I added it in http-snippet
apiVersion: v1
data:
  http-snippet: |
    chunked_transfer_encoding off;
  • I looked into the nginx.conf and can find it
/etc/nginx $ less nginx.conf | grep chunk -n -B 1 -A 2
236-    # Custom code snippet configured in the configuration configmap
237:    chunked_transfer_encoding off;
238-
239-    upstream upstream_balancer {
/etc/nginx $
  • still receiving error
    2024/04/04 15:52:35 [error] 1761#1761: *25367 upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream, client: 10.100.X.X, server: foo.bar.net, request: "POST /chunky/3/1 HTTP/2.0", upstream: "http://10.X.X.X:8080/chunky/3/1", host: "foo.bar.net"

Seems like the duplicate check happens either way. :(

Copy link

github-actions bot commented May 6, 2024

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

@github-actions github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label May 6, 2024
@SergeyLadutko
Copy link

me too

@github-actions github-actions bot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/deprecation Categorizes issue or PR as related to a feature/enhancement marked for deprecation. needs-priority triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

No branches or pull requests

8 participants