Skip to content

Feat/dht sync with node volitility#535

Open
mattyg wants to merge 5 commits intomainfrom
feat/dht-sync-with-node-volitility
Open

Feat/dht sync with node volitility#535
mattyg wants to merge 5 commits intomainfrom
feat/dht-sync-with-node-volitility

Conversation

@mattyg
Copy link
Member

@mattyg mattyg commented Feb 14, 2026

Summary

Resolves #443

This copies the write_get_agent_activity scenario, but adds conductor volatility to the get_agent_activity behavior.

TODO:

  • All code changes are reflected in docs, including module-level docs
  • All new/edited/removed scenarios are reflected in summary visualiser tool (see checklist)
  • I ran the Nomad CI workflow successfully on my branch

Note that all commits in a PR must follow Conventional Commits before it can be merged, as these are used to generate the changelog

Summary by CodeRabbit

  • New Features

    • Added new write_get_agent_activity_volatile scenario for load testing with randomized conductor restart cycles
    • Enhanced conductor lifecycle management with start, stop, and restart capabilities
    • Added comprehensive metrics tracking including zome call timings, conductor restart counts, and activity sequence monitoring
  • Tests

    • Added test case for new scenario with test data fixtures

@coderabbitai
Copy link

coderabbitai bot commented Feb 14, 2026

Walkthrough

This PR adds a new write_get_agent_activity_volatile scenario that tests agent activity behavior with conductor volatility. It includes refactored conductor lifecycle APIs (start/stop/restart) in the Holochain bindings, a new scenario crate with volatile behavior, metrics summarisation, and supporting configuration across Nomad workflows and test infrastructure.

Changes

Cohort / File(s) Summary
Workflow & Configuration
.github/workflows/nomad-common.yaml, .github/workflows/test.yaml, Cargo.toml
Added new Nomad job variant write_get_agent_activity_volatile to matrix, new smoke test step with 2 agents and volatile behavior, and registered scenario as workspace member.
Holochain Runner Lifecycle Refactoring
bindings/runner/src/common.rs, bindings/runner/src/context.rs, bindings/runner/src/holochain_runner.rs
Major refactoring: introduced create_and_start_holochain_conductor(), stop_holochain_conductor(), and start_holochain_conductor() helpers; added cell_role_name field to HolochainAgentContext; restructured HolochainRunner to use explicit config object with metrics_path field and separated creation from execution via create(config) and run(&mut self) methods.
Nomad Scenario Configuration
nomad/job-variants/canonical/vars/write_get_agent_activity_volatile.json, nomad/job-variants/demo/vars/write_get_agent_activity_volatile.json
Added scenario configuration files defining 2 write agents and 8 volatile get_agent_activity agents with 900-second duration for both canonical and demo variants.
New Scenario Implementation
scenarios/write_get_agent_activity_volatile/Cargo.toml, scenarios/write_get_agent_activity_volatile/README.md, scenarios/write_get_agent_activity_volatile/src/main.rs
New scenario crate with ScenarioValues tracking state, agent_behaviour_write for entry creation, and agent_behaviour_get_agent_activity_volatile for volatile conductor on/off cycling with random timings (10–30s online, 2–10s offline).
Summariser Module & Metrics
summariser/src/lib.rs, summariser/src/scenario.rs, summariser/src/scenario/write_get_agent_activity_volatile.rs
Added new scenario summariser pipeline collecting highest_observed_action_seq, zome call timings, startup/shutdown counts, error counts, and Holochain P2P metrics; integrated into scenario reporter dispatcher.
Test Data & Fixtures
summariser/test_data/1_run_summaries/..., summariser/test_data/2_query_results/..., summariser/test_data/3_summary_outputs/..., summariser/tests/snapshot.rs
Added comprehensive test fixtures: run summary JSON, 30+ query result JSON files with network/CPU/memory/metrics data, summary output snapshot, and new snapshot test function.
Summary Visualiser Template
summary-visualiser/templates/scenarios/write_get_agent_activity_volatile.html.tmpl
New HTML template for scenario visualization displaying metrics (highest action_seq, zome call timings, startup/shutdown counts, errors, P2P metrics) with conditional rendering blocks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • PR #333: Modifies bindings/runner startup and Holochain config plumbing for metrics path and conductor lifecycle management, sharing code-level changes with conductor/runner refactoring.
  • PR #499: Adjusts Holochain runner metrics path propagation and related conductor configuration APIs, directly overlapping with metrics plumbing changes.
  • PR #395: Adds scenario-specific summariser and template infrastructure for write/get-agent-activity variants, extending the same patterns applied here for the volatile variant.

Suggested reviewers

  • ThetaSinner
  • jost-s
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Feat/dht sync with node volitility' describes the main feature but contains a misspelling ('volitility' should be 'volatility') and uses a generic format that could be more descriptive of the actual change.
Description check ✅ Passed The PR description includes a summary, references issue #443, explains the changes (copying write_get_agent_activity with conductor volatility), and includes the required TODO checklist with all expected sections.
Linked Issues check ✅ Passed The PR substantially addresses issue #443 by adding the write_get_agent_activity_volatile scenario with conductor restart/stop functionality and summariser support, though the volatile behavior implementation appears simplified compared to the detailed requirements in the issue.
Out of Scope Changes check ✅ Passed All changes align with the scope of adding conductor volatility support: new scenario crate, holochain binding updates, workflow configuration, summariser module, and visualiser template. No extraneous changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/dht-sync-with-node-volitility

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

The following will be added to the changelog


[0.6.1] - 2026-02-14

Features

  • Basic summary visualiser for write_get_agent_activity_volatile scenario
  • Summariser for write_get_agent_activity_volatile scenario
  • Write_get_agent_activity_volatile scenario
  • Shutdown and restart conductor functions for holochain binding

CI

  • Run write_get_agent_activity_volatile scenario in CI

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split up creating and starting the HolochainRunner into separate functions. Add a function to shutdown the conductor without deleting its root directory.

@mattyg mattyg force-pushed the feat/dht-sync-with-node-volitility branch from 0a0d767 to 75fed31 Compare February 14, 2026 00:57
}

#[cfg(test)]
mod tests {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test that conductor data is deleted on drop, not shutdown.

/// Ok(())
/// }
/// ```
pub fn stop_holochain_conductor<SV: UserValuesConstraint>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function to stop holochain conductor, to use from scenarios.

@mattyg mattyg force-pushed the feat/dht-sync-with-node-volitility branch from 75fed31 to f3ad2a5 Compare February 14, 2026 01:03
@cocogitto-bot
Copy link

cocogitto-bot bot commented Feb 14, 2026

✔️ 77fda38...f3ad2a5 - Conventional commits check succeeded.

/// If starting the conductor fails, attempts to clean up the conductor root directory.
/// Returns [`HolochainRunner`] on success.
async fn start_holochain_conductor(
async fn create_and_start_holochain_conductor(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed for clarity.

/// If using an external conductor via connection-string, this function does nothing.
pub fn start_holochain_conductor<SV: UserValuesConstraint>(
ctx: &mut AgentContext<HolochainRunnerContext, HolochainAgentContext<SV>>,
) -> WindTunnelResult<()> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function to start a previously created conductor, to use from scenarios.

pub struct HolochainAgentContext<T: UserValuesConstraint = DefaultScenarioValues> {
pub(crate) installed_app_id: Option<String>,
/// The RoleName for the cell in the `cell_id` field
pub(crate) cell_role_name: Option<RoleName>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the "primary" cell's role name to context, so that we can access to call use_installed_app.

This will need to be if we want to support multiple cells, but I think its fine for now.


/// The path where influxive metrics will be written, by setting the
/// env variable HOLOCHAIN_INFLUXIVE_FILE for the holochain process.
metrics_path: Option<PathBuf>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move HOLOCHAIN_INFLUXIVE_FILE config into HolochainRunner config, so it can be applied when the conductor is started.


ctx.get_mut().scenario_values.shutdown_conductor_at = Some(Timestamp::from_micros(
Timestamp::now().as_micros() + conductor_on_duration().as_micros() as i64,
));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the conductor is started by a get_agent_activity_volatile agent, schedule its next shutdown time.

let reporter = ctx.runner_context().reporter();

if let Some(shutdown_at) = ctx.get().scenario_values.shutdown_conductor_at
&& Timestamp::now() >= shutdown_at
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the scheduled shutdown time has passed, shutdown the conductor and sleep for a randomly selected time. Then start the conductor and schedule its next shutdown time.

{{ template "scenario_metric" (dict
"name" "Startups"
"description" "The number of conductor startups by <code>get_agent_activity_volatile</code> peers."
"content" (tmpl.Exec "scalar" (dict "value" .startups.end "label" "Total" "unit" "startups"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very meaningful information. It just shows the total number of startups (and shutdown below). Is there some preferences on how this should be summarised and displayed?

Do we want to graph when individual agents are running? Or the total number of running agents in a single graph?

@mattyg mattyg marked this pull request as ready for review February 14, 2026 01:11
@mattyg mattyg requested a review from a team February 14, 2026 01:11
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bindings/runner/src/common.rs (1)

327-329: ⚠️ Potential issue | 🟠 Major

use_installed_app doesn't set cell_role_name, unlike install_app.

install_app (line 239) sets ctx.get_mut().cell_role_name = Some(role_name.clone()), but use_installed_app does not, despite receiving role_name as a parameter. This means calling use_installed_app standalone (outside of start_holochain_conductor) leaves cell_role_name as None if install_app was never called, causing a panic on cell_role_name().

Proposed fix
     ctx.get_mut().installed_app_id = Some(installed_app_id);
+    ctx.get_mut().cell_role_name = Some(role_name.clone());
     ctx.get_mut().cell_id = Some(cell_id);
     ctx.get_mut().app_client = Some(app_client);
🤖 Fix all issues with AI agents
In @.github/workflows/test.yaml:
- Around line 238-244: The workflow step named "Smoke test -
write_get_agent_activity_volatile" uses the wrong behaviour flag; replace the
argument "--behaviour get_agent_activity:1" with "--behaviour
get_agent_activity_volatile:1" so the smoke test runs the volatile behaviour
registered by the write_get_agent_activity_volatile scenario (refer to the
scenario name write_get_agent_activity_volatile and the behaviour identifier
get_agent_activity_volatile in the scenario code).

In `@bindings/runner/src/common.rs`:
- Around line 889-920: The doc comment for stop_holochain_conductor is
inaccurate: it does not drop the HolochainRunner or clean up directories but
instead takes the runner, calls shutdown() (which only drops the Child handle)
and then reassigns the runner back into ctx.get_mut().holochain_runner; update
the documentation on stop_holochain_conductor to state that it invokes
HolochainRunner::shutdown(), leaves the HolochainRunner instance in the context,
and does not remove on-disk directories (only Drop for HolochainRunner would do
that), or alternatively, if the intended behavior is to fully stop and remove
the runner, change the implementation to not reinsert the runner into
ctx.get_mut().holochain_runner so the HolochainRunner is dropped and its Drop
runs. Ensure references to HolochainRunner, shutdown(), and
ctx.get_mut().holochain_runner are clear in the updated comment or code change.

In `@bindings/runner/src/holochain_runner.rs`:
- Around line 174-198: In the create function, fix the typo in the error message
used in with_context for fs::create_dir_all: replace the string "Failed to
create cChonductor root directory '{}'" with the correct "Failed to create
conductor root directory '{}'" (reference symbols: create,
config.conductor_root_path).

In `@nomad/job-variants/canonical/vars/write_get_agent_activity_volatile.json`:
- Around line 1-14: The workflow currently reads .behaviours which doesn't exist
in these vars files, causing REQUIRED_NODES to default to 1; change the jq
expression in nomad-common.yaml (where it uses ".behaviours // 1 | length") to
instead read .assignments and sum each assignment's nodes (e.g., map(.nodes) |
add or equivalent) so REQUIRED_NODES reflects the total nodes (2 + 8 = 10 for
this file); update the expression that computes REQUIRED_NODES to reference
.assignments and aggregate the nodes field rather than using length on
behaviours.

In `@scenarios/write_get_agent_activity_volatile/README.md`:
- Line 5: Update the README sentence in the write_get_agent_activity_volatile
scenario to correct the typos: change "restsart" to "restart" and "semi random"
to "semi-random" so the line reads something like "A scenario where 'write'
peers create entries, while 'get_agent_activity_volatile' peers each query a
single 'write' agent's activity with `get_agent_activity` but shutdown and
restart their conductors at semi-random intervals." Reference the scenario
identifier write_get_agent_activity_volatile and the sentence containing
"get_agent_activity" to locate the text to edit.

In `@summariser/src/scenario/write_get_agent_activity_volatile.rs`:
- Line 81: The context string for the shutdowns metric is incorrect: replace the
misleading "Startup stats" message used on the shutdowns field so the error
context matches the metric; locate the shutdowns entry where
counter_stats(shutdowns, "value").context("Startup stats")? is called and change
the context to a correct message like "Shutdown stats" (or similarly
descriptive) so counter_stats and the shutdowns field report the proper context
on error.
🧹 Nitpick comments (9)
summariser/test_data/2_query_results/1ef6939655a0e1f28aaf75210edc117329ea642d5a3c63d01cd920209e15bbf8.json (1)

1-1: LGTM! Test data looks valid.

The JSON fixture correctly represents time-series network interface statistics with consistent timestamps and monotonically increasing packet counts. The single-line format likely matches actual query results, which is appropriate for test data.

Minor note: Consider adding a trailing newline at the end of the file per text file conventions, though this doesn't affect functionality.

summariser/test_data/2_query_results/b69f220c005ebe7196845718f6f8ad954d1c42d9c8202fc831af4e680e153c5d.json (1)

1-1: Consider adding a trailing newline and pretty-printing the JSON.

The JSON is valid and the data looks reasonable for testing the volatile scenario. Two optional improvements for maintainability:

  1. Missing trailing newline: Adding a newline at the end of the file follows Unix conventions and prevents git warnings.
  2. Single-line formatting: Pretty-printing the JSON would make diffs more readable when this test data changes in the future.

Since this is test data, these are minor quality-of-life improvements rather than functional issues.

📝 Example with formatting improvements
-[{"time":"2026-02-13 22:40:30","available_percent":24.682858245561675},{"time":"2026-02-13 22:40:40","available_percent":24.18100374370668},{"time":"2026-02-13 22:40:50","available_percent":25.30368196018231},{"time":"2026-02-13 22:41:00","available_percent":25.372369050207016},{"time":"2026-02-13 22:41:10","available_percent":25.222390256199624},{"time":"2026-02-13 22:41:20","available_percent":24.729261072168047}]
+[
+  {"time":"2026-02-13 22:40:30","available_percent":24.682858245561675},
+  {"time":"2026-02-13 22:40:40","available_percent":24.18100374370668},
+  {"time":"2026-02-13 22:40:50","available_percent":25.30368196018231},
+  {"time":"2026-02-13 22:41:00","available_percent":25.372369050207016},
+  {"time":"2026-02-13 22:41:10","available_percent":25.222390256199624},
+  {"time":"2026-02-13 22:41:20","available_percent":24.729261072168047}
+]
scenarios/write_get_agent_activity_volatile/src/main.rs (3)

106-114: Silently discarding the result of execute_in_place.

If the executor fails, the error is swallowed by let _ = .... Consider at least logging the error so conductor lifecycle issues during the off-period are diagnosable.

♻️ Proposed fix
-        let _ = ctx
+        if let Err(e) = ctx
             .runner_context()
             .executor()
             .execute_in_place(async move {
                 tokio::time::sleep(off_duration).await;
                 Ok(())
-            });
+            }) {
+            log::warn!("Error during conductor off-period sleep: {e:?}");
+        }

159-161: Unnecessary .clone() on an already owned value.

write_peer is owned via the if let Some(write_peer) binding; the clone is redundant.

♻️ Proposed fix
-            if let Some(write_peer) = maybe_write_peer {
-                ctx.get_mut().scenario_values.write_peer = Some(write_peer.clone());
+            if let Some(write_peer) = maybe_write_peer {
+                ctx.get_mut().scenario_values.write_peer = Some(write_peer);
             }

14-19: Simplify random selection: use random_range() instead of collecting into a Vec.

In rand 0.9+, rng.random_range(range) samples directly from a range without materializing it. The current implementation unnecessarily collects the entire range into a vector.

♻️ Suggested change
fn choose_random_duration_in_range(range: Range<u64>) -> Duration {
    let mut rng = rand::rng();
-   let options: Vec<u64> = range.collect();
-   let choice = options.choose(&mut rng).unwrap();
-   Duration::from_secs(*choice)
+   Duration::from_secs(rng.random_range(range))
}
summary-visualiser/templates/scenarios/write_get_agent_activity_volatile.html.tmpl (1)

13-46: Minor: the <div class="scenario-metrics"> wrapper renders even when .scenario_metrics is absent.

The {{ with }} guard is inside the div, so an empty <div class="scenario-metrics"></div> is emitted when there are no metrics. Consider moving the div inside the {{ with }} block for cleanliness, or leave as-is if this is the established pattern.

summariser/src/scenario/write_get_agent_activity_volatile.rs (1)

55-62: Unnecessary .clone() before .lazy().

The DataFrame returned from the await? is already owned. The .clone() on Line 59 is redundant.

♻️ Proposed fix
     let get_agent_activity_full_zome_calls =
         query::query_zome_call_instrument_data(client.clone(), &summary)
             .await
             .context("Load get_agent_activity_full zome call data")?
-            .clone()
             .lazy()
             .filter(col("fn_name").eq(lit("get_agent_activity_full")))
             .collect()?;
bindings/runner/src/common.rs (1)

926-961: Log messages say "Restarting" but the function is start_holochain_conductor.

Lines 930 and 934 log "Restarting Holochain conductor" but the public function is named start_holochain_conductor. Consider aligning the log messages with the function name for clarity when debugging, or renaming the function to restart_holochain_conductor since it also reconfigures URLs and reconnects the app (lines 951–953).

bindings/runner/src/holochain_runner.rs (1)

270-275: shutdown() relies on kill_on_drop(true) — the conductor is killed, not gracefully shut down.

shutdown() sets holochain_handle = None, dropping the Child process. Since the process is spawned with .kill_on_drop(true) (line 215), this sends a SIGKILL (on Unix). The doc comment says "gracefully shut down" in stop_holochain_conductor, but SIGKILL is not graceful — the conductor gets no chance to flush state.

For a volatility scenario simulating node crashes, this may be intentional. But if you intend a graceful shutdown in normal teardown, consider sending SIGTERM or writing a shutdown command to stdin before dropping.

Comment on lines +238 to +244
- name: Smoke test - write_get_agent_activity_volatile
run: |
MIN_AGENTS=2 nix run .#rust-smoke-test -- --package write_get_agent_activity_volatile -- --agents 2 --behaviour write:1 --behaviour get_agent_activity:1 --duration 5 --no-progress

echo "==> Available space after step"
echo
df -h
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the scenario directories
echo "=== Looking for write_get_agent_activity_volatile scenario ==="
fd --type d "write_get_agent_activity_volatile" --max-depth 3

echo ""
echo "=== Looking for write_get_agent_activity scenario (non-volatile) ==="
fd --type d "write_get_agent_activity" --max-depth 3 | grep -v volatile

Repository: holochain/wind-tunnel

Length of output: 278


🏁 Script executed:

#!/bin/bash
# Check the main.rs files in both scenarios
echo "=== Behaviours in write_get_agent_activity_volatile/main.rs ==="
if [ -f "scenarios/write_get_agent_activity_volatile/main.rs" ]; then
  cat scenarios/write_get_agent_activity_volatile/main.rs
else
  echo "File not found, searching..."
  find . -path "*/write_get_agent_activity_volatile/main.rs" -exec cat {} \;
fi

Repository: holochain/wind-tunnel

Length of output: 157


🏁 Script executed:

#!/bin/bash
# Check the non-volatile version for comparison
echo "=== Behaviours in write_get_agent_activity/main.rs (non-volatile) ==="
if [ -f "scenarios/write_get_agent_activity/main.rs" ]; then
  cat scenarios/write_get_agent_activity/main.rs
else
  echo "File not found, searching..."
  find . -path "*/write_get_agent_activity/main.rs" -exec cat {} \;
fi

Repository: holochain/wind-tunnel

Length of output: 163


🏁 Script executed:

#!/bin/bash
# Also check the workflow file context around those lines
echo "=== Workflow file lines 230-250 ==="
sed -n '230,250p' .github/workflows/test.yaml

Repository: holochain/wind-tunnel

Length of output: 1090


🏁 Script executed:

#!/bin/bash
# List the structure of both scenarios
echo "=== write_get_agent_activity_volatile directory structure ==="
find scenarios/write_get_agent_activity_volatile -type f | head -20

echo ""
echo "=== write_get_agent_activity directory structure ==="
find scenarios/write_get_agent_activity -type f | head -20

Repository: holochain/wind-tunnel

Length of output: 483


🏁 Script executed:

#!/bin/bash
# Search for where behaviour names are defined/registered
echo "=== Search for 'get_agent_activity_volatile' string in codebase ==="
rg "get_agent_activity_volatile" --type rust --type yaml

echo ""
echo "=== Search for behaviour registration patterns in scenarios ==="
rg "behaviour.*write" scenarios/ --type rust -A 2 -B 2 | head -50

Repository: holochain/wind-tunnel

Length of output: 7724


🏁 Script executed:

#!/bin/bash
# Look for the actual behaviour implementation names in rust files
echo "=== Looking for behaviour struct/impl definitions ==="
fd --type f "\.(rs|yaml)$" scenarios/ --exec grep -l "get_agent_activity" {} \;

Repository: holochain/wind-tunnel

Length of output: 390


🏁 Script executed:

#!/bin/bash
# View the volatile scenario main.rs to see behaviour registration
echo "=== write_get_agent_activity_volatile/src/main.rs (behaviour registrations) ==="
grep -n "use_named_agent_behaviour\|assigned_behaviour" scenarios/write_get_agent_activity_volatile/src/main.rs

echo ""
echo "=== write_get_agent_activity/src/main.rs (behaviour registrations) ==="
grep -n "use_named_agent_behaviour\|assigned_behaviour" scenarios/write_get_agent_activity/src/main.rs

Repository: holochain/wind-tunnel

Length of output: 643


Behaviour name incorrect — should be get_agent_activity_volatile:1.

The smoke test uses --behaviour get_agent_activity:1, but the write_get_agent_activity_volatile scenario registers the behaviour as get_agent_activity_volatile (see scenarios/write_get_agent_activity_volatile/src/main.rs line 177). This is a copy-paste from the write_get_agent_activity smoke test (line 232) without updating the behaviour name. The test will fail to run the intended volatile behaviour.

Proposed fix
      - name: Smoke test - write_get_agent_activity_volatile
        run: |
-          MIN_AGENTS=2 nix run .#rust-smoke-test -- --package write_get_agent_activity_volatile -- --agents 2 --behaviour write:1 --behaviour get_agent_activity:1 --duration 5 --no-progress
+          MIN_AGENTS=2 nix run .#rust-smoke-test -- --package write_get_agent_activity_volatile -- --agents 2 --behaviour write:1 --behaviour get_agent_activity_volatile:1 --duration 5 --no-progress
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Smoke test - write_get_agent_activity_volatile
run: |
MIN_AGENTS=2 nix run .#rust-smoke-test -- --package write_get_agent_activity_volatile -- --agents 2 --behaviour write:1 --behaviour get_agent_activity:1 --duration 5 --no-progress
echo "==> Available space after step"
echo
df -h
- name: Smoke test - write_get_agent_activity_volatile
run: |
MIN_AGENTS=2 nix run .#rust-smoke-test -- --package write_get_agent_activity_volatile -- --agents 2 --behaviour write:1 --behaviour get_agent_activity_volatile:1 --duration 5 --no-progress
echo "==> Available space after step"
echo
df -h
🤖 Prompt for AI Agents
In @.github/workflows/test.yaml around lines 238 - 244, The workflow step named
"Smoke test - write_get_agent_activity_volatile" uses the wrong behaviour flag;
replace the argument "--behaviour get_agent_activity:1" with "--behaviour
get_agent_activity_volatile:1" so the smoke test runs the volatile behaviour
registered by the write_get_agent_activity_volatile scenario (refer to the
scenario name write_get_agent_activity_volatile and the behaviour identifier
get_agent_activity_volatile in the scenario code).

Comment on lines +889 to +920
/// Stops the Holochain conductor if one is running.
///
/// This function will take the `holochain_runner` from the context and drop it, which will
/// gracefully shut down the conductor process and clean up its directories.
///
/// If no conductor is running this function does nothing.
/// If using an external conductor via connection-string, this function does nothing.
///
/// Call this function as follows:
/// ```rust
/// use holochain_wind_tunnel_runner::prelude::{HolochainAgentContext, HolochainRunnerContext, stop_holochain_conductor, AgentContext, HookResult};
///
/// fn agent_behaviour(ctx: &mut AgentContext<HolochainRunnerContext, HolochainAgentContext>) -> HookResult {
/// stop_holochain_conductor(ctx)?;
/// Ok(())
/// }
/// ```
pub fn stop_holochain_conductor<SV: UserValuesConstraint>(
ctx: &mut AgentContext<HolochainRunnerContext, HolochainAgentContext<SV>>,
) -> WindTunnelResult<()> {
if let Some(mut runner) = ctx.get_mut().holochain_runner.take() {
log::info!("Stopping Holochain conductor");
runner.shutdown();
log::info!("Holochain conductor stopped");

ctx.get_mut().holochain_runner = Some(runner);
} else {
log::debug!("No Holochain conductor is running, so nothing to stop");
}

Ok(())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation is misleading: runner is not dropped, and directories are not cleaned up on shutdown.

The doc comment says the function will "take the holochain_runner from the context and drop it, which will gracefully shut down the conductor process and clean up its directories." In practice:

  1. The runner is taken, shutdown() is called (which only drops the Child process handle), and then the runner is put back into the context (line 914).
  2. shutdown() does not clean up directories — only Drop::drop on HolochainRunner does.

This mismatch could confuse callers about post-call state (is the runner gone? are directories deleted?).

Suggested doc fix
-/// This function will take the `holochain_runner` from the context and drop it, which will
-/// gracefully shut down the conductor process and clean up its directories.
+/// This function will shut down the running conductor process while retaining the
+/// [`HolochainRunner`] in the context so it can be restarted later with
+/// [`start_holochain_conductor`]. Conductor directories are preserved until the runner is dropped.
 ///
-/// If no conductor is running this function does nothing.
-/// If using an external conductor via connection-string, this function does nothing.
+/// If no conductor is running, this function does nothing.
+/// If using an external conductor via connection-string, this function does nothing.
🤖 Prompt for AI Agents
In `@bindings/runner/src/common.rs` around lines 889 - 920, The doc comment for
stop_holochain_conductor is inaccurate: it does not drop the HolochainRunner or
clean up directories but instead takes the runner, calls shutdown() (which only
drops the Child handle) and then reassigns the runner back into
ctx.get_mut().holochain_runner; update the documentation on
stop_holochain_conductor to state that it invokes HolochainRunner::shutdown(),
leaves the HolochainRunner instance in the context, and does not remove on-disk
directories (only Drop for HolochainRunner would do that), or alternatively, if
the intended behavior is to fully stop and remove the runner, change the
implementation to not reinsert the runner into ctx.get_mut().holochain_runner so
the HolochainRunner is dropped and its Drop runs. Ensure references to
HolochainRunner, shutdown(), and ctx.get_mut().holochain_runner are clear in the
updated comment or code change.

Comment on lines +174 to +198
/// Create a holochain runner with the provided config, but do not start it.
pub fn create(config: &HolochainConfig) -> WindTunnelResult<Self> {
fs::create_dir_all(&config.conductor_root_path).with_context(|| {
format!(
"Failed to create conductor root directory '{}'",
conductor_root_path.display()
"Failed to create cChonductor root directory '{}'",
config.conductor_root_path.display()
)
})?;
let conductor_root_path = conductor_root_path.canonicalize()?;
let conductor_root_path = config.conductor_root_path.canonicalize()?;

log::trace!(
"Writing conductor config file to '{}'",
conductor_root_path.display()
);
let conductor_config_path = write_config(
config.conductor_config_root_path.clone(),
write_config(
config.conductor_root_path.clone().into(),
&config.conductor_config,
)
.context("Failed to write conductor config file")?
.to_path_buf();
.context("Failed to write conductor config file")?;

Ok(Self {
config: config.clone(),
holochain_handle: None,
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in error message: "cChonductor" → "conductor".

Line 178: "Failed to create cChonductor root directory".

Fix
-            "Failed to create cChonductor root directory '{}'",
+            "Failed to create conductor root directory '{}'",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Create a holochain runner with the provided config, but do not start it.
pub fn create(config: &HolochainConfig) -> WindTunnelResult<Self> {
fs::create_dir_all(&config.conductor_root_path).with_context(|| {
format!(
"Failed to create conductor root directory '{}'",
conductor_root_path.display()
"Failed to create cChonductor root directory '{}'",
config.conductor_root_path.display()
)
})?;
let conductor_root_path = conductor_root_path.canonicalize()?;
let conductor_root_path = config.conductor_root_path.canonicalize()?;
log::trace!(
"Writing conductor config file to '{}'",
conductor_root_path.display()
);
let conductor_config_path = write_config(
config.conductor_config_root_path.clone(),
write_config(
config.conductor_root_path.clone().into(),
&config.conductor_config,
)
.context("Failed to write conductor config file")?
.to_path_buf();
.context("Failed to write conductor config file")?;
Ok(Self {
config: config.clone(),
holochain_handle: None,
})
}
/// Create a holochain runner with the provided config, but do not start it.
pub fn create(config: &HolochainConfig) -> WindTunnelResult<Self> {
fs::create_dir_all(&config.conductor_root_path).with_context(|| {
format!(
"Failed to create conductor root directory '{}'",
config.conductor_root_path.display()
)
})?;
let conductor_root_path = config.conductor_root_path.canonicalize()?;
log::trace!(
"Writing conductor config file to '{}'",
conductor_root_path.display()
);
write_config(
config.conductor_root_path.clone().into(),
&config.conductor_config,
)
.context("Failed to write conductor config file")?;
Ok(Self {
config: config.clone(),
holochain_handle: None,
})
}
🤖 Prompt for AI Agents
In `@bindings/runner/src/holochain_runner.rs` around lines 174 - 198, In the
create function, fix the typo in the error message used in with_context for
fs::create_dir_all: replace the string "Failed to create cChonductor root
directory '{}'" with the correct "Failed to create conductor root directory
'{}'" (reference symbols: create, config.conductor_root_path).

Comment on lines +1 to +14
{
"scenario_name": "write_get_agent_activity_volatile",
"duration": 900,
"assignments": [
{
"behaviour": "write",
"nodes": 2
},
{
"behaviour": "get_agent_activity_volatile",
"nodes": 8
}
]
} No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check field usage in vars files and workflow parsing logic

echo "=== Checking JSON structure in canonical vars files ==="
fd -e json . nomad/job-variants/canonical/vars/ 2>/dev/null | head -20 | while read f; do
  echo "File: $f"
  jq 'keys' "$f" 2>/dev/null | head -3
  echo ""
done

echo "=== Searching for 'behaviours' field in all vars files ==="
rg '"behaviours"' --glob 'nomad/job-variants/*/vars/*.json' 2>/dev/null | head -10

echo "=== Searching for 'assignments' field in all vars files ==="
rg '"assignments"' --glob 'nomad/job-variants/*/vars/*.json' 2>/dev/null | head -10

echo "=== Examining nomad-common.yaml around line 86 ==="
if [ -f nomad-common.yaml ]; then
  sed -n '80,95p' nomad-common.yaml
elif [ -f .github/workflows/nomad-common.yaml ]; then
  sed -n '80,95p' .github/workflows/nomad-common.yaml
fi

Repository: holochain/wind-tunnel

Length of output: 3927


Fix the workflow to read assignments instead of behaviours field.

The workflow in nomad-common.yaml (line 86) reads .behaviours // 1 | length, but this file uses assignments (along with all other vars files using node assignments). Since .behaviours doesn't exist in any vars file, it defaults to 1, making 1 | length evaluate to 1. This causes REQUIRED_NODES to be 1 instead of the correct sum (e.g., 10 for 2 + 8 nodes in this file), potentially allowing the job to submit before sufficient nodes are available.

Update the jq expression to read .assignments and sum the nodes values appropriately, or align the field naming across all configs.

🤖 Prompt for AI Agents
In `@nomad/job-variants/canonical/vars/write_get_agent_activity_volatile.json`
around lines 1 - 14, The workflow currently reads .behaviours which doesn't
exist in these vars files, causing REQUIRED_NODES to default to 1; change the jq
expression in nomad-common.yaml (where it uses ".behaviours // 1 | length") to
instead read .assignments and sum each assignment's nodes (e.g., map(.nodes) |
add or equivalent) so REQUIRED_NODES reflects the total nodes (2 + 8 = 10 for
this file); update the expression that computes REQUIRED_NODES to reference
.assignments and aggregate the nodes field rather than using length on
behaviours.


### Description

A scenario where 'write' peers create entries, while 'get_agent_activity_volatile' peers each query a single 'write' agent's activity with `get_agent_activity` but shutdown and restsart their conductors at semi random intervals.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typos: "restsart" → "restart" and "semi random" → "semi-random".

📝 Proposed fix
-A scenario where 'write' peers create entries, while 'get_agent_activity_volatile' peers each query a single 'write' agent's activity with `get_agent_activity` but shutdown and restsart their conductors at semi random intervals.
+A scenario where 'write' peers create entries, while 'get_agent_activity_volatile' peers each query a single 'write' agent's activity with `get_agent_activity` but shutdown and restart their conductors at semi-random intervals.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
A scenario where 'write' peers create entries, while 'get_agent_activity_volatile' peers each query a single 'write' agent's activity with `get_agent_activity` but shutdown and restsart their conductors at semi random intervals.
A scenario where 'write' peers create entries, while 'get_agent_activity_volatile' peers each query a single 'write' agent's activity with `get_agent_activity` but shutdown and restart their conductors at semi-random intervals.
🧰 Tools
🪛 LanguageTool

[grammar] ~5-~5: Ensure spelling is correct
Context: ...h get_agent_activity but shutdown and restsart their conductors at semi random interva...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~5-~5: Use a hyphen to join words.
Context: ...wn and restsart their conductors at semi random intervals. Before a target 'writ...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
In `@scenarios/write_get_agent_activity_volatile/README.md` at line 5, Update the
README sentence in the write_get_agent_activity_volatile scenario to correct the
typos: change "restsart" to "restart" and "semi random" to "semi-random" so the
line reads something like "A scenario where 'write' peers create entries, while
'get_agent_activity_volatile' peers each query a single 'write' agent's activity
with `get_agent_activity` but shutdown and restart their conductors at
semi-random intervals." Reference the scenario identifier
write_get_agent_activity_volatile and the sentence containing
"get_agent_activity" to locate the text to edit.

)
.context("Timing stats for zome call get_agent_activity_full")?,
startups: counter_stats(startups, "value").context("Startup stats")?,
shutdowns: counter_stats(shutdowns, "value").context("Startup stats")?,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Copy-paste bug: error context says "Startup stats" but this is for shutdowns.

🐛 Proposed fix
-            shutdowns: counter_stats(shutdowns, "value").context("Startup stats")?,
+            shutdowns: counter_stats(shutdowns, "value").context("Shutdown stats")?,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
shutdowns: counter_stats(shutdowns, "value").context("Startup stats")?,
shutdowns: counter_stats(shutdowns, "value").context("Shutdown stats")?,
🤖 Prompt for AI Agents
In `@summariser/src/scenario/write_get_agent_activity_volatile.rs` at line 81, The
context string for the shutdowns metric is incorrect: replace the misleading
"Startup stats" message used on the shutdowns field so the error context matches
the metric; locate the shutdowns entry where counter_stats(shutdowns,
"value").context("Startup stats")? is called and change the context to a correct
message like "Shutdown stats" (or similarly descriptive) so counter_stats and
the shutdowns field report the proper context on error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DHT sync performance test with node volatility

1 participant