-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Mark all packets TX'ed before PTO as lost #2129
base: main
Are you sure you want to change the base?
Conversation
We'd previously only mark 1 one or two packets as lost when a PTO fired. That meant that we potentially didn't RTX all data that we could have (i.e., that was in lost packets that we didn't mark lost). This also changes the probing code to suppress redundant keep-alives, i.e., PINGs that we sent for other reasons, which could double as keep-alives but did not. Broken out of mozilla#1998
Failed Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2129 +/- ##
=======================================
Coverage 95.38% 95.38%
=======================================
Files 112 112
Lines 36593 36589 -4
=======================================
- Hits 34903 34901 -2
+ Misses 1690 1688 -2 ☔ View full report in Codecov by Sentry. |
Benchmark resultsPerformance differences relative to c6d5502. coalesce_acked_from_zero 1+1 entries: Change within noise threshold.time: [99.837 ns 100.16 ns 100.49 ns] change: [+0.1224% +0.7628% +1.2932%] (p = 0.01 < 0.05) coalesce_acked_from_zero 3+1 entries: Change within noise threshold.time: [118.62 ns 118.85 ns 119.11 ns] change: [+0.8694% +1.2845% +1.6720%] (p = 0.00 < 0.05) coalesce_acked_from_zero 10+1 entries: 💔 Performance has regressed.time: [118.47 ns 118.99 ns 119.60 ns] change: [+1.0188% +1.5767% +2.1513%] (p = 0.00 < 0.05) coalesce_acked_from_zero 1000+1 entries: Change within noise threshold.time: [98.033 ns 98.181 ns 98.351 ns] change: [+0.3972% +1.2430% +2.1775%] (p = 0.00 < 0.05) RxStreamOrderer::inbound_frame(): Change within noise threshold.time: [111.77 ms 111.83 ms 111.88 ms] change: [+0.2804% +0.3515% +0.4198%] (p = 0.00 < 0.05) SentPackets::take_ranges: No change in performance detected.time: [5.5314 µs 5.6195 µs 5.7116 µs] change: [-1.7100% +1.2485% +4.2196%] (p = 0.42 > 0.05) transfer/pacing-false/varying-seeds: No change in performance detected.time: [26.592 ms 27.745 ms 28.884 ms] change: [-0.5101% +5.5311% +11.892%] (p = 0.08 > 0.05) transfer/pacing-true/varying-seeds: Change within noise threshold.time: [35.669 ms 37.496 ms 39.347 ms] change: [+2.2693% +9.1382% +16.971%] (p = 0.01 < 0.05) transfer/pacing-false/same-seed: Change within noise threshold.time: [26.328 ms 27.237 ms 28.158 ms] change: [+0.4598% +4.7572% +9.3134%] (p = 0.04 < 0.05) transfer/pacing-true/same-seed: 💔 Performance has regressed.time: [43.374 ms 45.990 ms 48.663 ms] change: [+3.6413% +10.894% +18.646%] (p = 0.01 < 0.05) 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected.time: [886.24 ms 895.56 ms 905.00 ms] thrpt: [110.50 MiB/s 111.66 MiB/s 112.84 MiB/s] change: time: [-2.5595% -1.0431% +0.5240%] (p = 0.18 > 0.05) thrpt: [-0.5213% +1.0540% +2.6267%] 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: Change within noise threshold.time: [315.46 ms 318.70 ms 322.06 ms] thrpt: [31.050 Kelem/s 31.377 Kelem/s 31.700 Kelem/s] change: time: [-3.2876% -1.9212% -0.4317%] (p = 0.01 < 0.05) thrpt: [+0.4336% +1.9588% +3.3994%] 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.time: [33.620 ms 33.764 ms 33.917 ms] thrpt: [29.484 elem/s 29.618 elem/s 29.744 elem/s] change: time: [-0.8370% -0.0531% +0.7303%] (p = 0.89 > 0.05) thrpt: [-0.7250% +0.0531% +0.8441%] 1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: No change in performance detected.time: [1.6418 s 1.6622 s 1.6832 s] thrpt: [59.411 MiB/s 60.161 MiB/s 60.911 MiB/s] change: time: [-1.9410% -0.0386% +1.7919%] (p = 0.97 > 0.05) thrpt: [-1.7604% +0.0386% +1.9794%] Client/server transfer resultsTransfer of 33554432 bytes over loopback.
|
@martinthomson I'd appreciate a review, since the code I am touching is pretty complex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me. Thanks for extracting it into a smaller pull request.
I am in favor of waiting for Martin's review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not have tests for this? Should we?
.pto_packets(PtoState::pto_packet_count(*pn_space)) | ||
.cloned(), | ||
); | ||
lost.extend(space.pto_packets().cloned()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need pto_packet_count if this is the decision?
The other question I have is whether this is necessary. We're cloning all of the information so that we can process the loss, which means more work on a PTO. Maybe PTO is rare enough that this doesn't matter, but one of the reasons for the limit on number was to avoid the extra work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need pto_packet_count if this is the decision?
We do still need it to limit the number of packets we send on PTO.
The other question I have is whether this is necessary. We're cloning all of the information so that we can process the loss, which means more work on a PTO. Maybe PTO is rare enough that this doesn't matter, but one of the reasons for the limit on number was to avoid the extra work.
I've been wondering if it would be sufficient to mark n packets per space as lost, instead of all.
There are tests in #2128, but this PR alone doesn't make them succeed yet. |
Signed-off-by: Lars Eggert <[email protected]>
We'd previously only mark 1 one or two packets as lost when a PTO fired. That meant that we potentially didn't RTX all data that we could have (i.e., that was in lost packets that we didn't mark lost).
This also changes the probing code to suppress redundant keep-alives, i.e., PINGs that we sent for other reasons, which could double as keep-alives but did not.
Broken out of #1998