chore: Debug exit code 2 by ThetaSinner · Pull Request #429 · holochain/wind-tunnel

ThetaSinner · 2025-12-19T18:12:26Z

Summary

In aid of debugging #417

TODO:

All code changes are reflected in docs, including module-level docs
I ran the Nomad CI workflow successfully on my branch

Note that all commits in a PR must follow Conventional Commits before it can be merged, as these are used to generate the changelog

Summary by CodeRabbit

Chores
- Enhanced error handling and logging in job execution with improved output capture
- Improved allocation ID extraction robustness with detailed error messaging
- Added validation for extracted allocation IDs with debug output
- Introduced error handling for duration retrieval from configuration
- Expanded workflow output logging with job metadata

_{✏️ Tip: You can customize this high-level summary in your review settings.}

cocogitto-bot · 2025-12-19T18:12:32Z

✔️ e53fc52 - Conventional commits check succeeded.

coderabbitai · 2025-12-19T18:12:39Z

Walkthrough

The .github/workflows/nomad.yaml workflow file is enhanced with comprehensive error handling and validation. Changes include explicit start messaging, stdout/stderr capture on Nomad invocation failures, robust allocation ID extraction with granular error handling, allocation ID validation, duration retrieval from JSON with error handling, and expanded GITHUB_OUTPUT configuration with logging.

Changes

Cohort / File(s)	Summary
Nomad workflow error handling and validation `.github/workflows/nomad.yaml`	Enhanced error handling around Nomad job invocation with start messaging, stdout/stderr capture, and exit codes. Added separate error handling for allocation ID extraction via grep (no matches vs. regex/grep errors). Introduced validation for extracted allocation IDs with debug output. Added duration retrieval from nomad/vars/${JOB_NAME}.json with error handling. Expanded GITHUB_OUTPUT writes to include allocated IDs, run started timestamp, job name, and duration. Added final success acknowledgment logging.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Focus areas:
- Error handling branching logic for allocation ID extraction (grep no matches vs. regex errors)
- Allocation ID validation and paste command error handling
- JSON file read logic and duration handling
- GITHUB_OUTPUT variable formatting and configuration

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	Title is partially related to the changeset—it mentions 'Debug exit code 2' but the actual changes focus on enhancing error handling, logging, and validation across multiple aspects of the Nomad workflow, not just debugging a specific exit code.	Consider revising the title to better reflect the primary changes, such as 'Enhanced error handling and logging in Nomad workflow' or 'Improve robustness of Nomad job invocation and allocation ID extraction'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	Description includes a summary and the required TODO checklist matching the template, but the summary is minimal and lacks details about the actual changes made to the Nomad workflow.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chore/debug-exit-code-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-12-19T18:12:43Z

The following will be added to the changelog

[0.5.0-dev.1] - 2025-12-19

Miscellaneous Tasks

Debug exit code 2

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6faf0f6 and e53fc52.

📒 Files selected for processing (1)

.github/workflows/nomad.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: veeso
Repo: holochain/wind-tunnel PR: 245
File: .github/workflows/nomad.yaml:101-110
Timestamp: 2025-08-12T12:34:18.851Z
Learning: When extracting multiple allocation IDs from Nomad job output in wind-tunnel, use `grep -oE` with extended regex instead of `grep -oP` with PCRE for better portability. The pattern should capture full hex IDs including hyphens: `[0-9a-f-]+`. AWK-based solutions may not properly handle multiple allocation IDs across multiple lines.

📚 Learning: 2025-08-12T12:34:18.851Z

Learnt from: veeso
Repo: holochain/wind-tunnel PR: 245
File: .github/workflows/nomad.yaml:101-110
Timestamp: 2025-08-12T12:34:18.851Z
Learning: When extracting multiple allocation IDs from Nomad job output in wind-tunnel, use `grep -oE` with extended regex instead of `grep -oP` with PCRE for better portability. The pattern should capture full hex IDs including hyphens: `[0-9a-f-]+`. AWK-based solutions may not properly handle multiple allocation IDs across multiple lines.

Applied to files:

.github/workflows/nomad.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: test (ubuntu-latest)
GitHub Check: Run local_signals

🔇 Additional comments (4)

.github/workflows/nomad.yaml (4)

171-177: LGTM! Good error handling.

The error handling for the paste command is well-structured, with clear error messages and diagnostic output.

179-187: LGTM! Comprehensive validation.

The validation check for empty allocation IDs is appropriate and includes helpful debug output for troubleshooting.

190-195: LGTM! Proper JSON parsing with error handling.

The duration retrieval using jq -e is correct, and the error handling appropriately captures and reports any issues.

143-143: LGTM! Helpful logging additions.

The additional logging messages throughout the script improve observability and help with debugging, which aligns well with the PR's debugging objective.

Also applies to: 155-155, 188-188, 197-197, 204-205

coderabbitai · 2025-12-19T18:17:13Z

.github/workflows/nomad.yaml

+          if ! nomad_output=$(nix run --impure --inputs-from . nixpkgs#nomad -- job run nomad/jobs/${JOB_NAME}.nomad.hcl 2>&1); then
+            echo "ERROR: Failed to run Nomad job"
+            echo "Nomad command exit code: $?"
+            echo "Output:"
+            echo "$nomad_output"
+            exit 1


⚠️ Potential issue | 🔴 Critical

Critical: Exit code capture is incorrect.

Line 146 attempts to capture the Nomad command's exit code with $?, but this occurs after running echo "ERROR: Failed to run Nomad job" on line 145. The $? variable always holds the exit status of the last executed command, so it will be 0 (from the successful echo), not the actual Nomad failure code. This defeats the debugging purpose stated in the PR title "Debug exit code 2".

🔎 Proposed fix

echo "Running Nomad job: ${JOB_NAME}" if ! nomad_output=$(nix run --impure --inputs-from . nixpkgs#nomad -- job run nomad/jobs/${JOB_NAME}.nomad.hcl 2>&1); then + nomad_exit_code=$? echo "ERROR: Failed to run Nomad job" - echo "Nomad command exit code: $?" + echo "Nomad command exit code: $nomad_exit_code" echo "Output:" echo "$nomad_output" exit 1 fi

🤖 Prompt for AI Agents

In .github/workflows/nomad.yaml around lines 144 to 149 the script logs the Nomad command exit code using `$?` after running echo, so it always reports 0; capture the exit status immediately after the failing command and before any other command runs (e.g., assign rc=$? right after the `nix run ...` failure branch is entered), then use that saved variable when printing "Nomad command exit code" and exit with that code (or a clear non-zero). Ensure no commands between the failing command and the rc capture.

coderabbitai · 2025-12-19T18:17:14Z

.github/workflows/nomad.yaml

-          alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' | paste -sd ' ' -)
+
+          echo "Extracting allocation IDs from Nomad output..."
+          if ! alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' 2>&1); then


⚠️ Potential issue | 🟠 Major

Use grep -oE with extended regex instead of PCRE for portability.

The current implementation uses grep -oP with PCRE-specific features (\K and (?=...)). According to the project learning, extended regex with grep -oE should be used for better portability, and the pattern should capture allocation IDs including hyphens: [0-9a-f-]+.

Based on learnings, grep -oP should be replaced with grep -oE for better portability when extracting allocation IDs in wind-tunnel workflows.

🔎 Proposed fix for portability

echo "Extracting allocation IDs from Nomad output..." - if ! alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' 2>&1); then + if ! alloc_ids=$(echo "$nomad_output" | grep -oE --color=never 'Allocation "[0-9a-f-]+" created' | grep -oE '[0-9a-f-]+' | grep -v 'Allocation\|created' 2>&1); then grep_exit_code=$?

Alternatively, if the Nomad output format is predictable, you could use a simpler pattern:

echo "Extracting allocation IDs from Nomad output..." - if ! alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' 2>&1); then + if ! alloc_ids=$(echo "$nomad_output" | grep -oE 'Allocation "[0-9a-f-]+"' | cut -d'"' -f2 2>&1); then grep_exit_code=$?

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if ! alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' 2>&1); then

if ! alloc_ids=$(echo "$nomad_output" | grep -oE 'Allocation "[0-9a-f-]+"' | cut -d'"' -f2 2>&1); then

grep_exit_code=$?

🤖 Prompt for AI Agents

In .github/workflows/nomad.yaml around line 156, replace the PCRE grep -oP usage with a portable grep -oE solution: use an extended regex that allows hyphens in IDs (e.g. match "Allocation \"[0-9a-f-]+\"" or just the quoted ID with '"[0-9a-f-]+"'), then strip the surrounding quotes (via sed or tr) to produce the raw allocation IDs; this removes PCRE-only constructs (\K and lookahead) and ensures portability across grep implementations.

mattyg · 2025-12-20T00:03:48Z

.github/workflows/nomad.yaml

-          nomad_output=$(nix run --impure --inputs-from . nixpkgs#nomad -- job run nomad/jobs/${JOB_NAME}.nomad.hcl)
+
+          echo "Running Nomad job: ${JOB_NAME}"
+          if ! nomad_output=$(nix run --impure --inputs-from . nixpkgs#nomad -- job run nomad/jobs/${JOB_NAME}.nomad.hcl 2>&1); then


I thought set -euo pipefail should already cause the command within $() to send its stderr to the shell's stderr which I thought github actions would print?

chore: Debug exit code 2

e53fc52

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

mattyg reviewed Dec 20, 2025

View reviewed changes

mattyg mentioned this pull request Dec 20, 2025

Nomad workflow step "Run Nomad Job" inconsistently fails with exit code 2 #417

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Debug exit code 2#429

chore: Debug exit code 2#429
ThetaSinner wants to merge 1 commit intomainfrom
chore/debug-exit-code-2

ThetaSinner commented Dec 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

cocogitto-bot bot commented Dec 19, 2025

Uh oh!

coderabbitai bot commented Dec 19, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 19, 2025

Uh oh!

coderabbitai bot Dec 19, 2025

Uh oh!

mattyg Dec 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if ! alloc_ids=$(echo "$nomad_output" \| grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' 2>&1); then
	if ! alloc_ids=$(echo "$nomad_output" \| grep -oE 'Allocation "[0-9a-f-]+"' \| cut -d'"' -f2 2>&1); then
	grep_exit_code=$?

Conversation

ThetaSinner commented Dec 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

TODO:

Summary by CodeRabbit

Uh oh!

cocogitto-bot bot commented Dec 19, 2025

Uh oh!

coderabbitai bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Dec 19, 2025

[0.5.0-dev.1] - 2025-12-19

Miscellaneous Tasks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

mattyg Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ThetaSinner commented Dec 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 19, 2025 •

edited

Loading

mattyg Dec 20, 2025 •

edited

Loading