Skip to content

fix: exclude first and last window from timing stats, not the first and last element in each window#531

Open
mattyg wants to merge 1 commit intomainfrom
fix/exclude-trailing-windows
Open

fix: exclude first and last window from timing stats, not the first and last element in each window#531
mattyg wants to merge 1 commit intomainfrom
fix/exclude-trailing-windows

Conversation

@mattyg
Copy link
Member

@mattyg mattyg commented Feb 10, 2026

Summary

Previously in the standard timing stats, the first and last element within a window was dropped. Now the first and last window is dropped. This is because we assume they have data from a incomplete range within the window, so will add noise to metrics.

TODO:

  • All code changes are reflected in docs, including module-level docs
  • All new/edited/removed scenarios are reflected in summary visualiser tool (see checklist)
  • I ran the Nomad CI workflow successfully on my branch

Note that all commits in a PR must follow Conventional Commits before it can be merged, as these are used to generate the changelog

Summary by CodeRabbit

  • Refactor
    • Simplified windowed mean computation, producing updated trend values and occasional trend-length changes in performance summaries.
  • Bug Fixes
    • Previously empty trend series filled and many timing trend points renormalized or slightly adjusted.
  • Notes
    • Most public interfaces unchanged, though several summary JSON metric groupings were reorganized, altering some exported metric shapes.

@coderabbitai
Copy link

coderabbitai bot commented Feb 10, 2026

Walkthrough

Small change in windowed mean aggregation in summariser/src/analyze.rs (compute mean directly). No public API changes. Many JSON test outputs under summariser/test_data/3_summary_outputs/ were updated — numeric trend values, some trend lengths, and in several files the holochain_p2p_metrics JSON shape was reorganized.

Changes

Cohort / File(s) Summary
Core logic
summariser/src/analyze.rs
Replaced prior slice/reverse/slice partial-window handling with a direct window mean: col(column).mean().alias("mean"). No exported signatures changed.
Numerical trend updates
summariser/test_data/3_summary_outputs/app_install-52bd66...ec.json, .../app_install-8468d6...55a.json, .../dht_sync_lag-3a1e33...e51.json, .../first_call-608026...30.json, .../local_signals-1fd6a6...a7060.json, .../mixed_arc_get_agent_activity-...1727.json, .../remote_call_rate-...cd74.json, .../remote_signals-...06a.json, .../single_write_many_read-...397.json, .../two_party_countersigning-...68c.json, .../validation_receipts-...70.json, .../write_get_agent_activity-...ee7.json, .../write_query-...c7233.json, .../write_read-...ef1c.json, .../write_validated-...2b7.json, .../write_validated_must_get_agent_activity-...792c.json, .../zome_call_single_value-...e99b.json
Pointwise numeric changes across many trend arrays: small value adjustments, occasional single-value corrections, some trends extended by one element. Window metadata (e.g., window_duration, mean/std) generally preserved.
Files with JSON schema/structure changes
summariser/test_data/3_summary_outputs/full_arc_create_validated_zero_arc_read-...c83.json, .../mixed_arc_must_get_agent_activity-...911c.json, .../zero_arc_create_and_read-...88.json, .../zero_arc_create_data-...37d.json, .../zero_arc_create_data_validated-...9bc.json, .../mixed_arc_get_agent_activity-...1727.json
Reorganization of holochain_p2p_metrics (introduced request_roundtrip_duration, handle_incoming_request_duration, reorganized/expanded request_count and handle_incoming_request_count fields). Multiple trend arrays expanded to multi-line formatting and several previously-empty trends populated. These changes alter the public JSON shape in affected files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • ThetaSinner
  • jost-s
  • veeso
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly and specifically describes the main change: excluding the first and last window instead of the first and last element within windows.
Description check ✅ Passed The description provides a clear summary of the behavior change and rationale, and includes all required TODO items from the template.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/exclude-trailing-windows

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mattyg mattyg requested a review from a team February 10, 2026 21:52
@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

The following will be added to the changelog


[0.6.1] - 2026-02-13

Features

  • Do not exclude first and last element from each window in standard_timing_stats

jost-s
jost-s previously approved these changes Feb 11, 2026
Copy link
Member

@ThetaSinner ThetaSinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the existing code was a strange heuristic and should probably have been commented to explain it but there was reasoning behind it.

The first version of this code was more like this change. However, the reason I changed away from that was throwing away whole windows sets a lower bound on how long scenarios must run for. With a 10s window, you need to run for at least 40s to see a trend. Sure that only affects the summariser testing really. For a real run though, with a 60s window - discarding 2 minutes of a 15 minute run is quite significant.

What I was trying to achieve was improved accuracy without discarding too much data. Perhaps the best option here is to just let the first and last windows be less accurate? So removing the slice(1, u32::MAX) calls as you have here, but also not discarding whole windows?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
summariser/src/analyze.rs (1)

69-95: ⚠️ Potential issue | 🟠 Major

Timing trend does not exclude first and last windows, contrary to PR objective.

The PR description states: "The change makes the code drop the first and last entire window instead." However, the new code computes the trend over all windows without any exclusion. Compare with standard_rate (lines 242–251), which explicitly slices off the first and last windows with slice(1, rate.height() - 2) and a comment explaining the rationale.

The test data supports this concern—in the zero_arc fixture, the first trend value (30.14) is a clear outlier compared to the rest (~40–43), which is consistent with an incomplete first window that should be dropped.

Suggested fix: apply the same first/last window exclusion as `standard_rate`
         .agg([col(column).mean().alias("mean")])
         .collect()
         .context("Windowed mean")?
-        .column("mean")?
-        .f64()?
-        .iter()
-        .filter_map(|v| v.map(|v| round_to_n_dp(v, 6)))
-        .collect_vec();
+        ;
+
+    // Slice to drop the first and last because they're likely to be partially filled windows.
+    let trend: Vec<f64> = if trend.height() <= 2 {
+        Vec::with_capacity(0)
+    } else {
+        trend
+            .column("mean")?
+            .slice(1, trend.height() - 2)
+            .f64()?
+            .iter()
+            .filter_map(|v| v.map(|v| round_to_n_dp(v, 6)))
+            .collect_vec()
+    };
🤖 Fix all issues with AI agents
In
`@summariser/test_data/3_summary_outputs/two_party_countersigning-3cdc5a29d42fbe93e971508e7cd8856367465e047c870b60283ed86a5bf5687c.json`:
- Around line 55-65: The timing trend fixtures in this JSON (accepted_timing and
initiated_timing arrays) are inconsistent with the updated timing trend logic in
analyze.rs that excludes the first and last windows; update the fixtures by
re-running the timing analysis to regenerate both accepted_timing and
initiated_timing trend arrays so they match the new behavior in analyze.rs, then
replace the old arrays in this file with the newly generated values.

In
`@summariser/test_data/3_summary_outputs/zero_arc_create_and_read-94d299900c09e3ae81800bfce0f497a9d12f98a7a4fd6fd4083793335bc18c88.json`:
- Around line 79-86: The JSON test fixture's "trend" array contains a low first
value (30.14) indicating a partial initial window; since analyze.rs was changed
to exclude first/last partial windows from timing trends, regenerate this
fixture so the "trend" array is computed after dropping partial windows (i.e.,
omit the first and/or last window entries as analyze.rs now does) and replace
the values in
zero_arc_create_and_read-94d299900c09e3ae81800bfce0f497a9d12f98a7a4fd6fd4083793335bc18c88.json
accordingly so the trend reflects only fully-filled windows.

Comment on lines 79 to 86
"trend": [
43.874391,
43.586477,
40.590589,
40.197726,
41.167061
30.140288,
43.715731,
43.443675,
40.569253,
40.128692,
41.117841
],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Test data includes suspected partial windows.

The first trend value (30.14) is significantly lower than the remaining values (~40–43), which is characteristic of a partially-filled first window. If the timing trend is updated to exclude first/last windows (as flagged in analyze.rs), this fixture will need to be regenerated.

🤖 Prompt for AI Agents
In
`@summariser/test_data/3_summary_outputs/zero_arc_create_and_read-94d299900c09e3ae81800bfce0f497a9d12f98a7a4fd6fd4083793335bc18c88.json`
around lines 79 - 86, The JSON test fixture's "trend" array contains a low first
value (30.14) indicating a partial initial window; since analyze.rs was changed
to exclude first/last partial windows from timing trends, regenerate this
fixture so the "trend" array is computed after dropping partial windows (i.e.,
omit the first and/or last window entries as analyze.rs now does) and replace
the values in
zero_arc_create_and_read-94d299900c09e3ae81800bfce0f497a9d12f98a7a4fd6fd4083793335bc18c88.json
accordingly so the trend reflects only fully-filled windows.

@mattyg mattyg force-pushed the fix/exclude-trailing-windows branch from c4dedac to 81a07aa Compare February 13, 2026 17:24
@cocogitto-bot
Copy link

cocogitto-bot bot commented Feb 13, 2026

✔️ 81a07aa - Conventional commits check succeeded.

@mattyg
Copy link
Member Author

mattyg commented Feb 13, 2026

What I was trying to achieve was improved accuracy without discarding too much data. Perhaps the best option here is to just let the first and last windows be less accurate? So removing the slice(1, u32::MAX) calls as you have here, but also not discarding whole windows?

Updated to keep all results and not discard windows. We're also discarding the first and last windows in standard rate stats -- should that be changed as well? @ThetaSinner

@mattyg mattyg requested a review from ThetaSinner February 13, 2026 17:26
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
summariser/src/analyze.rs (1)

88-95: ⚠️ Potential issue | 🟠 Major

Missing first/last window exclusion from the timing trend.

The PR objective states the behavior should change from dropping the first and last element within each window to dropping the first and last entire window. The old per-element slicing was removed (good), but no window-level exclusion was added to the trend.

Compare with standard_rate (lines 242–261), which explicitly slices off the first and last windows:

// standard_rate, lines 242-251
let trend: Vec<u32> = if rate.height() <= 2 {
    Vec::with_capacity(0)
} else {
    rate.column("count")?
        .slice(1, rate.height() - 2)
        ...
};

The timing trend should likely apply the same pattern. Without it, the trend still includes data from potentially incomplete first/last windows.

Suggested fix (mirroring the `standard_rate` approach)
         .agg([col(column).mean().alias("mean")])
         .collect()
         .context("Windowed mean")?
-        .column("mean")?
+        ;
+
+    let trend_frame = trend_frame.column("mean")?;
+
+    // Slice to drop the first and last because they're likely to be partially filled windows.
+    let trend_col = if trend_frame.len() <= 2 {
+        // Not enough windows to drop first and last
+        &trend_frame.slice(0, 0)
+    } else {
+        &trend_frame.slice(1, trend_frame.len() - 2)
+    };
+
+    let trend = trend_col
         .f64()?
         .iter()
         .filter_map(|v| v.map(|v| round_to_n_dp(v, 6)))
         .collect_vec();
🧹 Nitpick comments (3)
summariser/test_data/3_summary_outputs/zero_arc_create_data_validated-52a0ce7a16521345d301ed0750d079b8513ad48d3f49524ceb725ae0770cd9bc.json (1)

319-343: Unrelated holochain_p2p_metrics restructuring bundled with the timing-window fix.

The structural changes here (adding empty request_roundtrip_duration/handle_incoming_request_duration objects, new fields like remote_signal, publish_counter_sign, countersigning_session_negotiation) appear unrelated to the PR's stated goal of excluding first/last windows from timing stats. Bundling unrelated changes in the same PR makes it harder to bisect regressions. Consider whether these belong in a separate commit or PR, or at minimum document them in the PR description.

summariser/test_data/3_summary_outputs/mixed_arc_must_get_agent_activity-c965934f5262aa6a6442a7d3a23306c8497ee785d416d8269650be33db67911c.json (1)

378-401: holochain_p2p_metrics schema restructuring appears unrelated to the windowing fix.

The addition of request_roundtrip_duration, handle_incoming_request_duration, new fields in request_count (e.g., count_links, send_validation_receipts), and the new handle_incoming_request_count block look like a separate schema change. If this came from regenerating test fixtures against a newer version of the input data or dependencies, it's fine — but it might be worth noting in the PR description to avoid confusion for reviewers.

summariser/test_data/3_summary_outputs/mixed_arc_get_agent_activity-d874c654d75fdc83f5f7c0d7eb2d6be8cfcaf2cc0f5abbc281ad939397351727.json (1)

592-617: holochain_p2p_metrics schema restructuring appears unrelated to the PR's stated objective.

The PR title and description focus on excluding first/last windows from timing stats, but this section introduces structural changes to the holochain_p2p_metrics shape: new top-level keys (request_roundtrip_duration, handle_incoming_request_duration), new fields in handle_incoming_request_count (response, remote_signal, publish_counter_sign, countersigning_session_negotiation), and field reordering. If this is an intentional bundled change (e.g., from an upstream Holochain update), it may be worth noting in the PR description for reviewer clarity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants