Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use 1MB socket TX and RX buffers #2470

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

larseggert
Copy link
Collaborator

We need to do the same for neqo-glue.

Fixes #1962

Copy link

codecov bot commented Mar 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.41%. Comparing base (483916d) to head (fea46ec).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2470      +/-   ##
==========================================
- Coverage   95.42%   95.41%   -0.02%     
==========================================
  Files         115      115              
  Lines       36996    36996              
  Branches    36996    36996              
==========================================
- Hits        35305    35301       -4     
- Misses       1687     1689       +2     
- Partials        4        6       +2     
Components Coverage Δ
neqo-common 97.17% <ø> (-0.36%) ⬇️
neqo-crypto 90.44% <ø> (ø)
neqo-http3 94.50% <ø> (ø)
neqo-qpack 96.29% <ø> (ø)
neqo-transport 96.24% <ø> (ø)
neqo-udp 95.29% <ø> (+0.58%) ⬆️

Copy link

github-actions bot commented Mar 3, 2025

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 06e42ce.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

github-actions bot commented Mar 3, 2025

Benchmark results

Performance differences relative to 3b8fac0.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected.
       time:   [708.93 ms 713.89 ms 718.86 ms]
       thrpt:  [139.11 MiB/s 140.08 MiB/s 141.06 MiB/s]
change:
       time:   [-1.0741% -0.0989% +0.8670%] (p = 0.84 > 0.05)
       thrpt:  [-0.8595% +0.0990% +1.0858%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.
       time:   [348.19 ms 349.95 ms 351.73 ms]
       thrpt:  [28.431 Kelem/s 28.575 Kelem/s 28.720 Kelem/s]
change:
       time:   [-0.0649% +0.6197% +1.3119%] (p = 0.08 > 0.05)
       thrpt:  [-1.2949% -0.6159% +0.0649%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [25.083 ms 25.257 ms 25.437 ms]
       thrpt:  [39.313  elem/s 39.593  elem/s 39.867  elem/s]
change:
       time:   [-0.4792% +0.4111% +1.3877%] (p = 0.38 > 0.05)
       thrpt:  [-1.3687% -0.4094% +0.4815%]
1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: 💚 Performance has improved.
       time:   [1.7304 s 1.7504 s 1.7709 s]
       thrpt:  [56.469 MiB/s 57.131 MiB/s 57.789 MiB/s]
change:
       time:   [-8.5830% -7.1515% -5.7332%] (p = 0.00 < 0.05)
       thrpt:  [+6.0819% +7.7023% +9.3888%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

decode 4096 bytes, mask ff: No change in performance detected.
       time:   [11.999 µs 12.030 µs 12.069 µs]
       change: [-0.1780% +0.3925% +0.9310%] (p = 0.20 > 0.05)

Found 19 outliers among 100 measurements (19.00%)
4 (4.00%) low severe
3 (3.00%) low mild
1 (1.00%) high mild
11 (11.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [3.0560 ms 3.0655 ms 3.0766 ms]
       change: [-0.8493% -0.3092% +0.1956%] (p = 0.25 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [20.053 µs 20.116 µs 20.181 µs]
       change: [-0.4139% +0.0553% +0.5256%] (p = 0.82 > 0.05)

Found 21 outliers among 100 measurements (21.00%)
2 (2.00%) low severe
1 (1.00%) high mild
18 (18.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.
       time:   [5.4030 ms 5.4148 ms 5.4282 ms]
       change: [-0.4107% -0.0609% +0.2623%] (p = 0.74 > 0.05)

Found 14 outliers among 100 measurements (14.00%)
14 (14.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [6.3276 µs 6.3636 µs 6.4051 µs]
       change: [-0.4234% +0.1483% +0.7510%] (p = 0.63 > 0.05)

Found 22 outliers among 100 measurements (22.00%)
5 (5.00%) low severe
5 (5.00%) low mild
3 (3.00%) high mild
9 (9.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [2.5058 ms 2.5126 ms 2.5197 ms]
       change: [-0.6469% -0.2057% +0.2352%] (p = 0.37 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe

1 streams of 1 bytes/multistream: No change in performance detected.
       time:   [72.599 µs 72.801 µs 73.006 µs]
       change: [-0.3916% +0.0643% +0.5308%] (p = 0.78 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1000 streams of 1 bytes/multistream: Change within noise threshold.
       time:   [25.262 ms 25.296 ms 25.329 ms]
       change: [-0.4508% -0.2630% -0.0707%] (p = 0.01 < 0.05)
10000 streams of 1 bytes/multistream: No change in performance detected.
       time:   [1.7015 s 1.7030 s 1.7045 s]
       change: [-0.0984% +0.0396% +0.1769%] (p = 0.58 > 0.05)

Found 13 outliers among 100 measurements (13.00%)
9 (9.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe

1 streams of 1000 bytes/multistream: No change in performance detected.
       time:   [73.869 µs 74.916 µs 76.392 µs]
       change: [-0.3969% +1.1086% +2.9998%] (p = 0.27 > 0.05)

Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high severe

100 streams of 1000 bytes/multistream: Change within noise threshold.
       time:   [3.4162 ms 3.4231 ms 3.4304 ms]
       change: [+0.0318% +0.3256% +0.6008%] (p = 0.03 < 0.05)

Found 25 outliers among 100 measurements (25.00%)
25 (25.00%) high mild

1000 streams of 1000 bytes/multistream: No change in performance detected.
       time:   [145.36 ms 145.44 ms 145.52 ms]
       change: [-0.0322% +0.0479% +0.1242%] (p = 0.24 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [94.627 ns 94.955 ns 95.292 ns]
       change: [-0.3754% +0.1986% +0.7914%] (p = 0.50 > 0.05)

Found 12 outliers among 100 measurements (12.00%)
7 (7.00%) high mild
5 (5.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [112.66 ns 112.96 ns 113.29 ns]
       change: [-0.5928% -0.1222% +0.3217%] (p = 0.61 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
1 (1.00%) high mild
8 (8.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [112.44 ns 113.01 ns 113.67 ns]
       change: [-2.2240% -0.4405% +0.7023%] (p = 0.68 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
11 (11.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [92.645 ns 93.074 ns 93.561 ns]
       change: [-2.1394% -0.7792% +0.6364%] (p = 0.29 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [116.12 ms 116.18 ms 116.25 ms]
       change: [-0.1525% -0.0810% -0.0107%] (p = 0.03 < 0.05)

Found 19 outliers among 100 measurements (19.00%)
5 (5.00%) low severe
2 (2.00%) low mild
12 (12.00%) high severe

SentPackets::take_ranges: No change in performance detected.
       time:   [8.2465 µs 8.4724 µs 8.6749 µs]
       change: [-3.8707% +0.0989% +3.9861%] (p = 0.96 > 0.05)

Found 18 outliers among 100 measurements (18.00%)
10 (10.00%) low severe
6 (6.00%) low mild
2 (2.00%) high severe

transfer/pacing-false/varying-seeds: Change within noise threshold.
       time:   [35.722 ms 35.798 ms 35.875 ms]
       change: [+0.6341% +0.9371% +1.2451%] (p = 0.00 < 0.05)
transfer/pacing-true/varying-seeds: Change within noise threshold.
       time:   [35.930 ms 35.993 ms 36.058 ms]
       change: [+1.2892% +1.5334% +1.7996%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-false/same-seed: Change within noise threshold.
       time:   [35.748 ms 35.802 ms 35.857 ms]
       change: [+0.5413% +0.7303% +0.9397%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild

transfer/pacing-true/same-seed: Change within noise threshold.
       time:   [36.128 ms 36.184 ms 36.240 ms]
       change: [+0.4921% +0.7012% +0.9022%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

Client/server transfer results

Performance differences relative to 3b8fac0.

Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.

Client Server CC Pacing Mean ± σ Min Max MiB/s ± σ Δ main Δ main
neqo neqo reno on 426.4 ± 37.9 394.4 588.7 75.1 ± 0.8 -8.6 -2.0%
neqo neqo reno 463.7 ± 169.5 395.9 1105.9 69.0 ± 0.2 -17.1 -3.6%
neqo neqo cubic on 420.3 ± 32.4 392.8 563.8 76.1 ± 1.0 1.7 0.4%
neqo neqo cubic 420.4 ± 38.9 391.3 590.6 76.1 ± 0.8 3.8 0.9%
google neqo reno on 748.1 ± 99.8 518.0 996.1 42.8 ± 0.3 -5.3 -0.7%
google neqo reno 748.8 ± 91.2 542.4 875.3 42.7 ± 0.4 -2.6 -0.4%
google neqo cubic on 759.1 ± 96.8 542.0 1019.8 42.2 ± 0.3 4.4 0.6%
google neqo cubic 749.2 ± 98.1 542.9 973.1 42.7 ± 0.3 -5.1 -0.7%
google google 581.9 ± 19.3 559.7 633.8 55.0 ± 1.7 -2.6 -0.4%
neqo msquic reno on 272.2 ± 38.0 242.0 440.4 117.6 ± 0.8 1.2 0.5%
neqo msquic reno 272.9 ± 36.4 247.1 431.2 117.3 ± 0.9 4.9 1.8%
neqo msquic cubic on 273.8 ± 46.5 244.4 449.1 116.9 ± 0.7 12.8 4.9%
neqo msquic cubic 275.9 ± 33.1 243.4 405.3 116.0 ± 1.0 2.5 0.9%
msquic msquic 179.0 ± 48.6 149.5 367.0 178.8 ± 0.7 -2.5 -1.4%

⬇️ Download logs

Copy link
Collaborator

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of implementing this in neqo-bin, how about proposing it upstream quinn-udp:

https://github.com/quinn-rs/quinn/blob/8d6e48c20b71f7e915c13c33b66e2a09b6d59888/quinn-udp/src/unix.rs#L85-L86

I assume close to all QUIC implementations will want a large send and receive buffer.

@larseggert
Copy link
Collaborator Author

Instead of implementing this in neqo-bin, how about proposing it upstream quinn-udp:

Sounds good. Or maybe quinn-udp can offer functions to set those values.

I don't understand why the benches show a performance improvement, but the runs over loopback show a regression.

@larseggert
Copy link
Collaborator Author

@mxinden where would you suggest to add this in quinn-udp? As part of UdpSocketState or somewhere else?

@larseggert
Copy link
Collaborator Author

@mxinden alternatively, I'd suggest merging this to neqo now and switching over to any quinn-udp mechanism in a future PR.

@mxinden
Copy link
Collaborator

mxinden commented Mar 19, 2025

@larseggert how about something along the lines of:

diff --git a/quinn-udp/src/unix.rs b/quinn-udp/src/unix.rs
index c39941d5..0b5bbd21 100644
--- a/quinn-udp/src/unix.rs
+++ b/quinn-udp/src/unix.rs
@@ -257,6 +257,14 @@ impl UdpSocketState {
         self.may_fragment
     }
 
+    pub fn set_send_buffer_size(&self, bytes: usize)-> io::Result<()>  {
+        todo!();
+    }
+
+    pub fn set_rec_buffer_size(&self, bytes: usize) -> io::Result<()> {
+        todo!();
+    }
+
     /// Returns true if we previously got an EINVAL error from `sendmsg` syscall.
     fn sendmsg_einval(&self) -> bool {
         self.sendmsg_einval.load(Ordering::Relaxed)
@@ -543,7 +551,7 @@ fn recv(io: SockRef<'_>, bufs: &mut [IoSliceMut<'_>], meta: &mut [RecvMeta]) ->
     Ok(1)
 }

@larseggert
Copy link
Collaborator Author

@larseggert how about something along the lines of

Do we also need the equivalent for Windows?

@mxinden
Copy link
Collaborator

mxinden commented Mar 19, 2025

Yes. I would say the more platforms the better.

@larseggert
Copy link
Collaborator Author

@mxinden let me know if quinn-rs/quinn#2179 is what you had in mind.

@larseggert larseggert marked this pull request as draft March 24, 2025 15:36
We need to do the same for `neqo-glue`.

Fixes mozilla#1962
@larseggert
Copy link
Collaborator Author

larseggert commented Mar 25, 2025

On my Mac, I now see:

0.003 DEBUG Increasing send buffer size from 9216 to 1048576
0.003 DEBUG Increasing receive buffer size from 786896 to 1048576

Note that the send buffer size is apparently based on the net.inet.udp.maxdgram sysctl? Weird. Need to check this on other platforms.

Linux (from a QNS run):

0.002 DEBUG Increasing send buffer size from 212992 to 1048576
0.002 DEBUG Default receive buffer size is 1048576, not changing

Windows (from CI):

0.000 WARN Increasing send buffer size from 131072 to 1048576
0.000 WARN Increasing receive buffer size from 131072 to 1048576

https://lists.freebsd.org/pipermail/freebsd-questions/2005-February/075624.html says:

UDP is not buffered in the kernel like TCP is, so a sendspace sysctl
is not possible. When the interface queue becomes full (i.e. you are
sending data to the interface at a greater rate than it can put it on
the wire), you will see this error message, and your application needs
to be able to handle it.

This seems to indicate that we should not do this increase on macOS and BSDs?
@mxinden do we see ENOBUFS and are we handling that similar to EAGAIN?

@larseggert larseggert marked this pull request as ready for review March 25, 2025 12:32
@larseggert
Copy link
Collaborator Author

I also don't understand why loopback benchmark performance is going down.

@mxinden
Copy link
Collaborator

mxinden commented Mar 25, 2025

I assume more buffer space is not always better. This reminds me of the Source Buffer Management draft.

https://stuartcheshire.github.io/draft-cheshire-sbm/draft-cheshire-sbm.html

@mxinden
Copy link
Collaborator

mxinden commented Mar 25, 2025

@mxinden do we see ENOBUFS and are we handling that similar to EAGAIN?

We currently don't. Something along the following lines should work. Want to try running a benchmark on your local Mac to see whether either of them triggers?

diff --git a/Cargo.lock b/Cargo.lock
index 64bfcc1f..d6b8b6a7 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -1028,6 +1028,7 @@ name = "neqo-udp"
 version = "0.12.2"
 dependencies = [
  "cfg_aliases",
+ "libc",
  "log",
  "neqo-common",
  "quinn-udp",
diff --git a/neqo-udp/Cargo.toml b/neqo-udp/Cargo.toml
index 4abd402a..8ebc2e83 100644
--- a/neqo-udp/Cargo.toml
+++ b/neqo-udp/Cargo.toml
@@ -19,6 +19,7 @@ workspace = true
 log = { workspace = true }
 neqo-common = { path = "./../neqo-common" }
 quinn-udp = { workspace = true }
+libc = "0.2"
 
 [build-dependencies]
 cfg_aliases = "0.2"
diff --git a/neqo-udp/src/lib.rs b/neqo-udp/src/lib.rs
index d498f5aa..0cfe0e74 100644
--- a/neqo-udp/src/lib.rs
+++ b/neqo-udp/src/lib.rs
@@ -74,7 +74,19 @@ pub fn send_inner(
         src_ip: None,
     };
 
-    state.try_send(socket, &transmit)?;
+    if let Err(e) = state.try_send(socket, &transmit) {
+        match e.raw_os_error() {
+            Some(libc::ENOBUFS) => {
+                todo!();
+            }
+            Some(libc::EAGAIN) => {
+                todo!();
+            }
+            Some(_) | None => {}
+        }
+
+        return Err(e);
+    }
 
     qtrace!(
         "sent {} bytes from {} to {}",

Related: Neither of them are currently exposed through std::io::ErrorKind. See rust-lang/rust#79965 for some discussion on adding these to std.

@larseggert
Copy link
Collaborator Author

I assume more buffer space is not always better. This reminds me of the Source Buffer Management draft.

stuartcheshire.github.io/draft-cheshire-sbm/draft-cheshire-sbm.html

It's different, in that I doubt that we actually fill that larger buffer, and they do. (If we filled it, we should IMO also see a throughput increase.)

@larseggert
Copy link
Collaborator Author

OK, with the fixed bencher the erratic performance seems to be fixed 🎉

I guess we'd want this one merged to see the full impact of #1868?

let send_buf = state.send_buffer_size((&socket).into())?;
if send_buf < ONE_MB {
qdebug!("Increasing send buffer size from {send_buf} to {ONE_MB}");
state.set_send_buffer_size((&socket).into(), ONE_MB)?;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails, should we retry with a smaller value? (Also for recv_buf.)

state.set_recv_buffer_size((&socket).into(), ONE_MB)?;
} else {
qdebug!("Default receive buffer size is {recv_buf}, not changing");
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add telemetry/stats about which sizes were actually in use for a connection?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider increasing OS socket buffer (SO_SNDBUF, SO_RCVBUF)
2 participants