Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpine (musl) based haproxy ingress images performance issue #541

Open
amorozkin opened this issue May 30, 2023 · 3 comments
Open

Alpine (musl) based haproxy ingress images performance issue #541

amorozkin opened this issue May 30, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request investigation more investigation needed on our side

Comments

@amorozkin
Copy link

amorozkin commented May 30, 2023

Could you please consider adding an option to use non-alpine based haproxy ingress images?

Alpine's PTHREAD implementaion has a drasitc CPU overhead - (internals/details can be found here https://stackoverflow.com/questions/73807754/how-one-pthread-waits-for-another-to-finish-via-futex-in-linux/73813907#73813907 )

Here are two strace statistics samples for the same load profile (25K RPS via 3 haproxy ingress pods) for the equal period of time (about 1 minute):
1. GLIBC based haproxy

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 47.55  147.946790          53   2787268    880506 recvfrom
 26.33   81.933249          88    929414           sendto
 16.81   52.295309          54    962217           epoll_ctl
  3.37   10.486387          51    203040           getpid
  1.48    4.597493          51     90048           clock_gettime
  1.41    4.380619          97     44924           epoll_wait
  0.64    2.003053          54     36497           getsockopt
  0.56    1.731618          97     17829           close
  0.51    1.582058          56     28118           setsockopt
  0.39    1.207813          66     18144      8945 accept4
  0.38    1.188416         116     10223     10223 connect
  0.29    0.903808          88     10223           socket
  0.18    0.548180          53     10223           fcntl
  0.10    0.299368          79      3785      1130 futex
  0.00    0.011658          60       193           timer_settime
  0.00    0.010546          54       193        30 rt_sigreturn
------ ----------- ----------- --------- --------- ----------------
100.00  311.126365               5152339    900834 total

2. MUSL based haproxy:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 68.24  412.454997          96   4259899    419280 futex
 10.00   60.440537         120    502107           madvise
  8.74   52.833292         111    472438    121948 recvfrom
  4.22   25.477060         166    152913           sendto
  2.80   16.921311         107    157293           getpid
  2.26   13.680361         109    125062           epoll_ctl
  1.38    8.351141         119     69682           writev
  0.54    3.254861         106     30535           clock_gettime
  0.37    2.255775         148     15187           epoll_pwait
  0.34    2.033282         178     11419           close
  0.31    1.844610         117     15724      5964 accept4
  0.25    1.530881         110     13850           setsockopt
  0.25    1.509742         107     14001           getsockopt
  0.08    0.466851         157      2966           munmap
  0.06    0.392208         170      2294      2294 connect
  0.06    0.378519         107      3505           mmap
  0.05    0.287839         125      2294           socket
  0.04    0.234976         102      2294           fcntl
  0.00    0.014530          94       154           timer_settime
  0.00    0.014262          92       154        15 rt_sigreturn
  0.00    0.006613         143        46        23 read
  0.00    0.003571         148        24           write
  0.00    0.003377         241        14           shutdown
------ ----------- ----------- --------- --------- ----------------
100.00  604.390596               5853855    549524 total

As you can see - the last one (MUSL based one) - 60+% of time spends on futex (FUTEX_WAKE_PRIVATE to be exact) system calls.
As a reuslt - more than twice higher CPU utilisation on the same load profile acommpaned by upstream's sessions number spikes:
image

@PKizzle
Copy link
Contributor

PKizzle commented Jun 12, 2023

I tested it on my Raspberry Pi but did not encounter such a huge performance difference. What TLS ciphers were used in the graph above?

@amorozkin
Copy link
Author

amorozkin commented Jun 13, 2023

I tested it on my Raspberry Pi but did not encounter such a huge performance difference. What TLS ciphers were used in the graph above?

In both cases the same haproxy config was used with TLS options:

  ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11
  ssl-default-bind-ciphers TLS13-AES-256-GCM-SHA384:TLS13-AES-128-GCM-SHA256:TLS13-CHACHA20-POLY1305-SHA256:EECDH+AESGCM:EECDH+CHACHA20
  tune.ssl.default-dh-param 2048

Both Haproxy itslef and Upstream (single one in test above) use 4096 bit length TLS certificates (annotaion "haproxy.org/server-ssl: "true" is configured in ingress)

K8s nodes: KVM VMs (Ubuntu 20.04.4 LTS, 5.4.0-109-generic, k8s version v1.23.4)

PODs:

        resources:
          limits:
            cpu: "12"
            memory: 24Gi
          requests:
            cpu: "10"
            memory: 24Gi
....
      securityContext:
        sysctls:
        - name: net.ipv4.ip_local_port_range
          value: 1024 65535
        - name: net.ipv4.tcp_rmem
          value: 8192 87380 33554432
        - name: net.ipv4.tcp_wmem
          value: 8192 65536 33554432
        - name: net.ipv4.tcp_max_syn_backlog
          value: "20000"
        - name: net.core.somaxconn
          value: "20000"
        - name: net.ipv4.tcp_tw_reuse
          value: "1"
        - name: net.ipv4.tcp_syncookies
          value: "0"
        - name: net.ipv4.tcp_slow_start_after_idle
          value: "0"
        - name: net.ipv4.tcp_fin_timeout
          value: "30"
        - name: net.ipv4.tcp_keepalive_time
          value: "30"
        - name: net.ipv4.tcp_keepalive_intvl
          value: "10"
        - name: net.ipv4.tcp_keepalive_probes
          value: "3"
        - name: net.ipv4.tcp_no_metrics_save
          value: "1"

Haproxy: nbthread: "8"

IMHO TLS handshakes should not play a great deal with keepalive connections used on both ends: client<->haproxy AND haproxy<->upstream

@ivanmatmati ivanmatmati added the enhancement New feature or request label Jun 15, 2023
@dkorunic
Copy link
Member

@amorozkin I am reasonably sure this is not related to Alpine MUSL at all, but related to OpenSSL 3.0/3.1 mutex contention issues. I suspect your Glibc-based distribution is using OpenSSL 1.1.1, isn't it?

@dkorunic dkorunic self-assigned this Jan 16, 2024
@dkorunic dkorunic added the investigation more investigation needed on our side label Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request investigation more investigation needed on our side
Projects
None yet
Development

No branches or pull requests

4 participants