Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTU-related severe performance issues #284

Open
rlpowell opened this issue Feb 15, 2022 · 3 comments
Open

MTU-related severe performance issues #284

rlpowell opened this issue Feb 15, 2022 · 3 comments
Labels

Comments

@rlpowell
Copy link

This may be the same thing as #128 , as that also has a 5 second delay at the beginning, but I can't tell and I have a lot of detail so I didn't want to clutter it up.

Short version: in a rootless podman with default everything (i.e. slirp4netns with default MTU of 65520), curl of a file with a size above the MTU takes 5 seconds when it should be much much less than a second. Reducing the MTU fixes it.

My environment:

$ sudo yum list installed '*podman*' '*slirp*'
[sudo] password for rlpowell:
Installed Packages
libslirp.x86_64                                                                                           4.6.1-2.fc35                                                                                       @fedora
podman.x86_64                                                                                             3:3.4.4-1.fc35                                                                                     @updates
podman-gvproxy.x86_64                                                                                     3:3.4.4-1.fc35                                                                                     @updates
podman-plugins.x86_64                                                                                     3:3.4.4-1.fc35                                                                                     @updates
slirp4netns.x86_64                                                                                        1.1.12-2.fc35                                                                                      @fedora

$ cat /etc/redhat-release
Fedora release 35 (Thirty Five)

Repro:

Dockerfile:

FROM fedora:35

RUN yum -y install netcat time

Run podman build -t slirptest .

In another window on the same host (maybe in a temp dir):

$ dd bs=1024 count=64 if=/dev/zero of=64k_file.bin
64+0 records in
64+0 records out
65536 bytes (66 kB, 64 KiB) copied, 0.000972638 s, 67.4 MB/s
$ dd bs=1024 count=63 if=/dev/zero of=63k_file.bin
63+0 records in
63+0 records out
64512 bytes (65 kB, 63 KiB) copied, 0.000511444 s, 126 MB/s
$ python -m http.server 8081
Serving HTTP on 0.0.0.0 port 8081 (http://0.0.0.0:8081/) ...

In the another window:

$ podman run --rm -it slirptest bash
# time curl [IP of host]:8081/64k_file.bin | wc
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 65536  100 65536    0     0  13108      0  0:00:04  0:00:04 --:--:--  8627
      0       0   65536

real    0m5.012s
user    0m0.013s
sys     0m0.008s

Note that it pauses for about 5 seconds after the first chunk of data.

Then try:

$ podman run --rm -it --net slirp4netns:mtu=1500 slirptest bash
#  time curl 192.168.123.134:8081/64k_file.bin | wc
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 65536  100 65536    0     0  19.3M      0 --:--:-- --:--:-- --:--:-- 31.2M
      0       0   65536

real    0m0.016s
user    0m0.008s
sys     0m0.010s

500x performance difference. :D

Also:

$ podman run --rm -it slirptest bash
# time curl 192.168.123.134:8081/63k_file.bin | wc
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 64512  100 64512    0     0  22.0M      0 --:--:-- --:--:-- --:--:-- 30.7M
      0       0   64512

real    0m0.016s
user    0m0.008s
sys     0m0.010s

So the 64k file causes the problem but the 63k file does not.

In case it's relevant, here's the host's mtu configs:

$ ip addr | grep -i mtu
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
2: enp5s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000

The communication in question appears, to tcpdump, to come over lo.

My binary search shows that the issue doesn't occur at mtu=48000 and lower, but does occur at mtu=48500 and higher. I have no idea what the significance of that is.

$ podman run --rm -it --net slirp4netns:mtu=48000 slirptest bash
[root@4b1d4880bb30 /]# time curl 192.168.123.134:8081/64k_file.bin | wc
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 65536  100 65536    0     0  21.2M      0 --:--:-- --:--:-- --:--:-- 31.2M
      0       0   65536

real    0m0.016s
user    0m0.010s
sys     0m0.008s
[root@4b1d4880bb30 /]#
exit
$ podman run --rm -it --net slirp4netns:mtu=48500 slirptest bash
[root@99a0585cffb0 /]# time curl 192.168.123.134:8081/64k_file.bin | wc
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 65536  100 65536    0     0  11401      0  0:00:05  0:00:05 --:--:--  7208
      0       0   65536

real    0m5.762s
user    0m0.008s
sys     0m0.013s
[root@99a0585cffb0 /]#
@mg90707
Copy link

mg90707 commented Mar 16, 2022

I just wanted to add that I have severe performance issues with the default MTU of 65520, too. Running on a Windows Host with a Linux Guest VM (Ubuntu 20.04) and running Iperf3 from a container running on the VM and connecting to the Host:
Outside of container: 5 Gbits/s
Rootfull container: 5 Gbits/s
Rootless container MTU 1500: 1.5 Gbits/s
Rootless container MTU 65520: 60 Mbits/s (yes, Megabits)

EDIT: For completeness here the stats when connecting from the host to the container with port-driver slirp4netns. No MTU dependent slowdown here.
Outside of container: 3 Gbits/s
Rootfull container: 3 Gbits/s
Rootless container MTU 1500: 1.6 Gbits/s
Rootless container MTU 65520: 1.8 Gbits/s

EDIT again: For the tests I was using Docker Rootless v20.10.12.

@srstsavage
Copy link

Can verify the above. In a rootless Docker environment using nginx 1.20 to proxy internal containers, we were seeing many of the requests to nginx take 10+ seconds while requests directly to the proxied services took less than a second. MTU on the Docker daemon was set to 65520. Reducing MTU to 48000 fixed this issue.

Using docker run --sysctl net.ipv4.tcp_rmem="4096 87380 6291456" as suggested in the slirp4netns README did not fix the issue and seemed to have no effect, although it's possible this was user error ¯\(ツ)

Ubuntu 20.04.2
Docker (rootless) 20.10.6
slirp4netns 0.4.3

@yaroslavsadin
Copy link

Same issue. Using podman 4.0.2 and slirp4netns 1.1.12 on AlmaLinux 9.0. Though in my case MTU ~20000 is the stable point.

MTU=1500 gives ~1Gbit/s
MTU=20000 gives ~9Gbit/s (about the same as --network=host )
MTU=48000 it goes back to ~1Gbit/s
MTU=65520 it goes below 1Gbit/s

These are iperf results between the container and another server in the same subnet. --sysctl net.ipv4.tcp_rmem="4096 87380 6291456" didn't help either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants