fix: exclude first and last window from timing stats, not the first and last element in each window#531
fix: exclude first and last window from timing stats, not the first and last element in each window#531
Conversation
WalkthroughSmall change in windowed mean aggregation in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The following will be added to the changelog [0.6.1] - 2026-02-13Features
|
ThetaSinner
left a comment
There was a problem hiding this comment.
I agree that the existing code was a strange heuristic and should probably have been commented to explain it but there was reasoning behind it.
The first version of this code was more like this change. However, the reason I changed away from that was throwing away whole windows sets a lower bound on how long scenarios must run for. With a 10s window, you need to run for at least 40s to see a trend. Sure that only affects the summariser testing really. For a real run though, with a 60s window - discarding 2 minutes of a 15 minute run is quite significant.
What I was trying to achieve was improved accuracy without discarding too much data. Perhaps the best option here is to just let the first and last windows be less accurate? So removing the slice(1, u32::MAX) calls as you have here, but also not discarding whole windows?
3cfe187 to
c4dedac
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
summariser/src/analyze.rs (1)
69-95:⚠️ Potential issue | 🟠 MajorTiming trend does not exclude first and last windows, contrary to PR objective.
The PR description states: "The change makes the code drop the first and last entire window instead." However, the new code computes the trend over all windows without any exclusion. Compare with
standard_rate(lines 242–251), which explicitly slices off the first and last windows withslice(1, rate.height() - 2)and a comment explaining the rationale.The test data supports this concern—in the zero_arc fixture, the first trend value (30.14) is a clear outlier compared to the rest (~40–43), which is consistent with an incomplete first window that should be dropped.
Suggested fix: apply the same first/last window exclusion as `standard_rate`
.agg([col(column).mean().alias("mean")]) .collect() .context("Windowed mean")? - .column("mean")? - .f64()? - .iter() - .filter_map(|v| v.map(|v| round_to_n_dp(v, 6))) - .collect_vec(); + ; + + // Slice to drop the first and last because they're likely to be partially filled windows. + let trend: Vec<f64> = if trend.height() <= 2 { + Vec::with_capacity(0) + } else { + trend + .column("mean")? + .slice(1, trend.height() - 2) + .f64()? + .iter() + .filter_map(|v| v.map(|v| round_to_n_dp(v, 6))) + .collect_vec() + };
🤖 Fix all issues with AI agents
In
`@summariser/test_data/3_summary_outputs/two_party_countersigning-3cdc5a29d42fbe93e971508e7cd8856367465e047c870b60283ed86a5bf5687c.json`:
- Around line 55-65: The timing trend fixtures in this JSON (accepted_timing and
initiated_timing arrays) are inconsistent with the updated timing trend logic in
analyze.rs that excludes the first and last windows; update the fixtures by
re-running the timing analysis to regenerate both accepted_timing and
initiated_timing trend arrays so they match the new behavior in analyze.rs, then
replace the old arrays in this file with the newly generated values.
In
`@summariser/test_data/3_summary_outputs/zero_arc_create_and_read-94d299900c09e3ae81800bfce0f497a9d12f98a7a4fd6fd4083793335bc18c88.json`:
- Around line 79-86: The JSON test fixture's "trend" array contains a low first
value (30.14) indicating a partial initial window; since analyze.rs was changed
to exclude first/last partial windows from timing trends, regenerate this
fixture so the "trend" array is computed after dropping partial windows (i.e.,
omit the first and/or last window entries as analyze.rs now does) and replace
the values in
zero_arc_create_and_read-94d299900c09e3ae81800bfce0f497a9d12f98a7a4fd6fd4083793335bc18c88.json
accordingly so the trend reflects only fully-filled windows.
...o_party_countersigning-3cdc5a29d42fbe93e971508e7cd8856367465e047c870b60283ed86a5bf5687c.json
Show resolved
Hide resolved
| "trend": [ | ||
| 43.874391, | ||
| 43.586477, | ||
| 40.590589, | ||
| 40.197726, | ||
| 41.167061 | ||
| 30.140288, | ||
| 43.715731, | ||
| 43.443675, | ||
| 40.569253, | ||
| 40.128692, | ||
| 41.117841 | ||
| ], |
There was a problem hiding this comment.
Test data includes suspected partial windows.
The first trend value (30.14) is significantly lower than the remaining values (~40–43), which is characteristic of a partially-filled first window. If the timing trend is updated to exclude first/last windows (as flagged in analyze.rs), this fixture will need to be regenerated.
🤖 Prompt for AI Agents
In
`@summariser/test_data/3_summary_outputs/zero_arc_create_and_read-94d299900c09e3ae81800bfce0f497a9d12f98a7a4fd6fd4083793335bc18c88.json`
around lines 79 - 86, The JSON test fixture's "trend" array contains a low first
value (30.14) indicating a partial initial window; since analyze.rs was changed
to exclude first/last partial windows from timing trends, regenerate this
fixture so the "trend" array is computed after dropping partial windows (i.e.,
omit the first and/or last window entries as analyze.rs now does) and replace
the values in
zero_arc_create_and_read-94d299900c09e3ae81800bfce0f497a9d12f98a7a4fd6fd4083793335bc18c88.json
accordingly so the trend reflects only fully-filled windows.
c4dedac to
81a07aa
Compare
|
✔️ 81a07aa - Conventional commits check succeeded. |
Updated to keep all results and not discard windows. We're also discarding the first and last windows in standard rate stats -- should that be changed as well? @ThetaSinner |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
summariser/src/analyze.rs (1)
88-95:⚠️ Potential issue | 🟠 MajorMissing first/last window exclusion from the timing trend.
The PR objective states the behavior should change from dropping the first and last element within each window to dropping the first and last entire window. The old per-element slicing was removed (good), but no window-level exclusion was added to the trend.
Compare with
standard_rate(lines 242–261), which explicitly slices off the first and last windows:// standard_rate, lines 242-251 let trend: Vec<u32> = if rate.height() <= 2 { Vec::with_capacity(0) } else { rate.column("count")? .slice(1, rate.height() - 2) ... };The timing trend should likely apply the same pattern. Without it, the trend still includes data from potentially incomplete first/last windows.
Suggested fix (mirroring the `standard_rate` approach)
.agg([col(column).mean().alias("mean")]) .collect() .context("Windowed mean")? - .column("mean")? + ; + + let trend_frame = trend_frame.column("mean")?; + + // Slice to drop the first and last because they're likely to be partially filled windows. + let trend_col = if trend_frame.len() <= 2 { + // Not enough windows to drop first and last + &trend_frame.slice(0, 0) + } else { + &trend_frame.slice(1, trend_frame.len() - 2) + }; + + let trend = trend_col .f64()? .iter() .filter_map(|v| v.map(|v| round_to_n_dp(v, 6))) .collect_vec();
🧹 Nitpick comments (3)
summariser/test_data/3_summary_outputs/zero_arc_create_data_validated-52a0ce7a16521345d301ed0750d079b8513ad48d3f49524ceb725ae0770cd9bc.json (1)
319-343: Unrelatedholochain_p2p_metricsrestructuring bundled with the timing-window fix.The structural changes here (adding empty
request_roundtrip_duration/handle_incoming_request_durationobjects, new fields likeremote_signal,publish_counter_sign,countersigning_session_negotiation) appear unrelated to the PR's stated goal of excluding first/last windows from timing stats. Bundling unrelated changes in the same PR makes it harder to bisect regressions. Consider whether these belong in a separate commit or PR, or at minimum document them in the PR description.summariser/test_data/3_summary_outputs/mixed_arc_must_get_agent_activity-c965934f5262aa6a6442a7d3a23306c8497ee785d416d8269650be33db67911c.json (1)
378-401:holochain_p2p_metricsschema restructuring appears unrelated to the windowing fix.The addition of
request_roundtrip_duration,handle_incoming_request_duration, new fields inrequest_count(e.g.,count_links,send_validation_receipts), and the newhandle_incoming_request_countblock look like a separate schema change. If this came from regenerating test fixtures against a newer version of the input data or dependencies, it's fine — but it might be worth noting in the PR description to avoid confusion for reviewers.summariser/test_data/3_summary_outputs/mixed_arc_get_agent_activity-d874c654d75fdc83f5f7c0d7eb2d6be8cfcaf2cc0f5abbc281ad939397351727.json (1)
592-617:holochain_p2p_metricsschema restructuring appears unrelated to the PR's stated objective.The PR title and description focus on excluding first/last windows from timing stats, but this section introduces structural changes to the
holochain_p2p_metricsshape: new top-level keys (request_roundtrip_duration,handle_incoming_request_duration), new fields inhandle_incoming_request_count(response,remote_signal,publish_counter_sign,countersigning_session_negotiation), and field reordering. If this is an intentional bundled change (e.g., from an upstream Holochain update), it may be worth noting in the PR description for reviewer clarity.
Summary
Previously in the standard timing stats, the first and last element within a window was dropped. Now the first and last window is dropped. This is because we assume they have data from a incomplete range within the window, so will add noise to metrics.
TODO:
Note that all commits in a PR must follow Conventional Commits before it can be merged, as these are used to generate the changelog
Summary by CodeRabbit