Skip to content

chore: Debug exit code 2#429

Open
ThetaSinner wants to merge 1 commit intomainfrom
chore/debug-exit-code-2
Open

chore: Debug exit code 2#429
ThetaSinner wants to merge 1 commit intomainfrom
chore/debug-exit-code-2

Conversation

@ThetaSinner
Copy link
Member

@ThetaSinner ThetaSinner commented Dec 19, 2025

Summary

In aid of debugging #417

TODO:

  • All code changes are reflected in docs, including module-level docs
  • I ran the Nomad CI workflow successfully on my branch

Note that all commits in a PR must follow Conventional Commits before it can be merged, as these are used to generate the changelog

Summary by CodeRabbit

  • Chores
    • Enhanced error handling and logging in job execution with improved output capture
    • Improved allocation ID extraction robustness with detailed error messaging
    • Added validation for extracted allocation IDs with debug output
    • Introduced error handling for duration retrieval from configuration
    • Expanded workflow output logging with job metadata

✏️ Tip: You can customize this high-level summary in your review settings.

@cocogitto-bot
Copy link

cocogitto-bot bot commented Dec 19, 2025

✔️ e53fc52 - Conventional commits check succeeded.

@coderabbitai
Copy link

coderabbitai bot commented Dec 19, 2025

Walkthrough

The .github/workflows/nomad.yaml workflow file is enhanced with comprehensive error handling and validation. Changes include explicit start messaging, stdout/stderr capture on Nomad invocation failures, robust allocation ID extraction with granular error handling, allocation ID validation, duration retrieval from JSON with error handling, and expanded GITHUB_OUTPUT configuration with logging.

Changes

Cohort / File(s) Summary
Nomad workflow error handling and validation
.github/workflows/nomad.yaml
Enhanced error handling around Nomad job invocation with start messaging, stdout/stderr capture, and exit codes. Added separate error handling for allocation ID extraction via grep (no matches vs. regex/grep errors). Introduced validation for extracted allocation IDs with debug output. Added duration retrieval from nomad/vars/${JOB_NAME}.json with error handling. Expanded GITHUB_OUTPUT writes to include allocated IDs, run started timestamp, job name, and duration. Added final success acknowledgment logging.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Focus areas:
    • Error handling branching logic for allocation ID extraction (grep no matches vs. regex errors)
    • Allocation ID validation and paste command error handling
    • JSON file read logic and duration handling
    • GITHUB_OUTPUT variable formatting and configuration

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive Title is partially related to the changeset—it mentions 'Debug exit code 2' but the actual changes focus on enhancing error handling, logging, and validation across multiple aspects of the Nomad workflow, not just debugging a specific exit code. Consider revising the title to better reflect the primary changes, such as 'Enhanced error handling and logging in Nomad workflow' or 'Improve robustness of Nomad job invocation and allocation ID extraction'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed Description includes a summary and the required TODO checklist matching the template, but the summary is minimal and lacks details about the actual changes made to the Nomad workflow.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/debug-exit-code-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

The following will be added to the changelog


[0.5.0-dev.1] - 2025-12-19

Miscellaneous Tasks

  • Debug exit code 2

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6faf0f6 and e53fc52.

📒 Files selected for processing (1)
  • .github/workflows/nomad.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: veeso
Repo: holochain/wind-tunnel PR: 245
File: .github/workflows/nomad.yaml:101-110
Timestamp: 2025-08-12T12:34:18.851Z
Learning: When extracting multiple allocation IDs from Nomad job output in wind-tunnel, use `grep -oE` with extended regex instead of `grep -oP` with PCRE for better portability. The pattern should capture full hex IDs including hyphens: `[0-9a-f-]+`. AWK-based solutions may not properly handle multiple allocation IDs across multiple lines.
📚 Learning: 2025-08-12T12:34:18.851Z
Learnt from: veeso
Repo: holochain/wind-tunnel PR: 245
File: .github/workflows/nomad.yaml:101-110
Timestamp: 2025-08-12T12:34:18.851Z
Learning: When extracting multiple allocation IDs from Nomad job output in wind-tunnel, use `grep -oE` with extended regex instead of `grep -oP` with PCRE for better portability. The pattern should capture full hex IDs including hyphens: `[0-9a-f-]+`. AWK-based solutions may not properly handle multiple allocation IDs across multiple lines.

Applied to files:

  • .github/workflows/nomad.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: Run local_signals
🔇 Additional comments (4)
.github/workflows/nomad.yaml (4)

171-177: LGTM! Good error handling.

The error handling for the paste command is well-structured, with clear error messages and diagnostic output.


179-187: LGTM! Comprehensive validation.

The validation check for empty allocation IDs is appropriate and includes helpful debug output for troubleshooting.


190-195: LGTM! Proper JSON parsing with error handling.

The duration retrieval using jq -e is correct, and the error handling appropriately captures and reports any issues.


143-143: LGTM! Helpful logging additions.

The additional logging messages throughout the script improve observability and help with debugging, which aligns well with the PR's debugging objective.

Also applies to: 155-155, 188-188, 197-197, 204-205

Comment on lines +144 to +149
if ! nomad_output=$(nix run --impure --inputs-from . nixpkgs#nomad -- job run nomad/jobs/${JOB_NAME}.nomad.hcl 2>&1); then
echo "ERROR: Failed to run Nomad job"
echo "Nomad command exit code: $?"
echo "Output:"
echo "$nomad_output"
exit 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Exit code capture is incorrect.

Line 146 attempts to capture the Nomad command's exit code with $?, but this occurs after running echo "ERROR: Failed to run Nomad job" on line 145. The $? variable always holds the exit status of the last executed command, so it will be 0 (from the successful echo), not the actual Nomad failure code. This defeats the debugging purpose stated in the PR title "Debug exit code 2".

🔎 Proposed fix
           echo "Running Nomad job: ${JOB_NAME}"
           if ! nomad_output=$(nix run --impure --inputs-from . nixpkgs#nomad -- job run nomad/jobs/${JOB_NAME}.nomad.hcl 2>&1); then
+            nomad_exit_code=$?
             echo "ERROR: Failed to run Nomad job"
-            echo "Nomad command exit code: $?"
+            echo "Nomad command exit code: $nomad_exit_code"
             echo "Output:"
             echo "$nomad_output"
             exit 1
           fi
🤖 Prompt for AI Agents
In .github/workflows/nomad.yaml around lines 144 to 149 the script logs the
Nomad command exit code using `$?` after running echo, so it always reports 0;
capture the exit status immediately after the failing command and before any
other command runs (e.g., assign rc=$? right after the `nix run ...` failure
branch is entered), then use that saved variable when printing "Nomad command
exit code" and exit with that code (or a clear non-zero). Ensure no commands
between the failing command and the rc capture.

alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' | paste -sd ' ' -)

echo "Extracting allocation IDs from Nomad output..."
if ! alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' 2>&1); then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use grep -oE with extended regex instead of PCRE for portability.

The current implementation uses grep -oP with PCRE-specific features (\K and (?=...)). According to the project learning, extended regex with grep -oE should be used for better portability, and the pattern should capture allocation IDs including hyphens: [0-9a-f-]+.

Based on learnings, grep -oP should be replaced with grep -oE for better portability when extracting allocation IDs in wind-tunnel workflows.

🔎 Proposed fix for portability
           echo "Extracting allocation IDs from Nomad output..."
-          if ! alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' 2>&1); then
+          if ! alloc_ids=$(echo "$nomad_output" | grep -oE --color=never 'Allocation "[0-9a-f-]+" created' | grep -oE '[0-9a-f-]+' | grep -v 'Allocation\|created' 2>&1); then
             grep_exit_code=$?

Alternatively, if the Nomad output format is predictable, you could use a simpler pattern:

           echo "Extracting allocation IDs from Nomad output..."
-          if ! alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' 2>&1); then
+          if ! alloc_ids=$(echo "$nomad_output" | grep -oE 'Allocation "[0-9a-f-]+"' | cut -d'"' -f2 2>&1); then
             grep_exit_code=$?
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if ! alloc_ids=$(echo "$nomad_output" | grep -oP --color=never 'Allocation "\K[0-9a-f]+(?=" created)' 2>&1); then
if ! alloc_ids=$(echo "$nomad_output" | grep -oE 'Allocation "[0-9a-f-]+"' | cut -d'"' -f2 2>&1); then
grep_exit_code=$?
🤖 Prompt for AI Agents
In .github/workflows/nomad.yaml around line 156, replace the PCRE grep -oP usage
with a portable grep -oE solution: use an extended regex that allows hyphens in
IDs (e.g. match "Allocation \"[0-9a-f-]+\"" or just the quoted ID with
'"[0-9a-f-]+"'), then strip the surrounding quotes (via sed or tr) to produce
the raw allocation IDs; this removes PCRE-only constructs (\K and lookahead) and
ensures portability across grep implementations.

nomad_output=$(nix run --impure --inputs-from . nixpkgs#nomad -- job run nomad/jobs/${JOB_NAME}.nomad.hcl)

echo "Running Nomad job: ${JOB_NAME}"
if ! nomad_output=$(nix run --impure --inputs-from . nixpkgs#nomad -- job run nomad/jobs/${JOB_NAME}.nomad.hcl 2>&1); then
Copy link
Member

@mattyg mattyg Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought set -euo pipefail should already cause the command within $() to send its stderr to the shell's stderr which I thought github actions would print?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants