Skip to content

Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation, Adds Reporting#5059

Open
myurasov-nv wants to merge 68 commits intoisaac-sim:developfrom
myurasov-nv:my-gated-tests-1
Open

Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation, Adds Reporting#5059
myurasov-nv wants to merge 68 commits intoisaac-sim:developfrom
myurasov-nv:my-gated-tests-1

Conversation

@myurasov-nv
Copy link
Collaborator

@myurasov-nv myurasov-nv commented Mar 18, 2026

Description

Until now every PR ran all GPU test jobs regardless of what changed. That is no-good and needs to change. :)

This PR:

  • Splits tests more evenly and according to functionality
  • Only runs relevant ones; manually all can still be executed
  • Adds reporting of the passed/failed tests on the Workflow page
  • Improves job cancellation so resources are freed faster

Detailed changes

  • Added a detect-changes job that runs git diff --name-only to produce per-package true/false outputs; each test job skips itself when its package is untouched. Core/infra changes still trigger everything. Manual "run all" option available via GH UI.
  • Split test-general into per-package jobs: test-isaaclab (2 shards), test-isaaclab-rl, -mimic, -contrib, -teleop, -visualizers, -assets. Each job is independently gated and sharded.
  • Replaced hardcoded include-files whitelists with deterministic shard-index % shard-count sharding, fixing 7 silently skipped isaaclab_tasks tests.
  • Broke out cuRobo Planner and SkillGen into separate test jobs with their own conditional execution gates.
  • Fixed conftest.py exclude pattern that was matching a comma-separated string as one substring (never matched).
  • Container is now killed on cancellation signals so jobs stop quickly via GH UI.
  • JUnit results are converted to Markdown and displayed on the GH Actions job page, including skip reasons.
  • Faster code checkout and updated actions/checkout@v4, actions/upload-artifact@v4, actions/download-artifact@v4 to latest SHA-pinned versions.
  • Extracted reusable composite action for repeating Docker test-run boilerplate.
  • Added Docker image pull timing for diagnostics.
  • Suppressed Node.js deprecation warnings in CI output.

How test_train_environments.py is split

The test function test_train_environments is parametrized by task_spec (all registered Isaac environments) and workflow (rl_games, rsl_rl, sb3, skrl), so pytest expands it into N_tasks x 4_workflows individual test items. The pytest_collection_modifyitems hook in conftest.py runs after pytest builds this full list and keeps only every Nth item based on TEST_SHARD_INDEX / TEST_SHARD_COUNT env vars (e.g. shard 0/3 gets items 0, 3, 6, ...). Each CI runner sees ~1/3 of the parametrized combinations, cutting wall time from ~1.2h to ~24min per shard.

Type of change

  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@myurasov-nv myurasov-nv changed the title Splits Tests in More Even Portions, Only Runs Relevant Ones [POC] Splits Tests in More Even Portions, Only Runs Relevant Ones Mar 18, 2026
@myurasov-nv myurasov-nv changed the title [POC] Splits Tests in More Even Portions, Only Runs Relevant Ones [POC] Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation Mar 18, 2026
@myurasov-nv myurasov-nv changed the title [POC] Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation, Adds Reporting Mar 19, 2026
@myurasov-nv myurasov-nv marked this pull request as ready for review March 19, 2026 21:42
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR is a significant CI/CD overhaul for IsaacLab that replaces a single monolithic "run everything" test workflow with a smart, selective system: a detect-changes job computes per-package change flags, and each downstream test job gates itself on the relevant flag. Tests are parallelised across 3 shards per package, cuRobo/SkillGen tests get their own dedicated jobs, and JUnit XML results are surfaced as Markdown summaries on the GitHub Actions job page. The changes also extract a reusable run-package-tests composite action, fix a long-standing conftest.py bug where comma-separated exclude patterns were matched as a single substring, and improve job cancellation by explicitly killing the Docker container.

Key findings:

  • Timeout exit-code bug (tools/conftest.py:493)pytest.exit only checks num_failing for its exit code; timed-out tests have result == "TIMEOUT" (not "FAILED"), so an all-timeout run still exits 0, making the individual test job appear green. The XML error count is correct and is caught by downstream checks for fork PRs, but internal-PR jobs will not surface the failure directly.
  • Pytest exit-code swallowed in Docker (run-tests/action.yml:184) — The || echo 'Pytest completed...' clause always succeeds, meaning DOCKER_EXIT is always 0. For paths where tools/conftest.py is not the test runner, real pytest failures will not propagate to the job exit code; only the XML report catches them.
  • actions/checkout version skewbuild.yaml uses actions/checkout@v6 throughout while daily-compatibility.yml stays on @v4, creating an inconsistency that may cause confusion during future maintenance.

Confidence Score: 3/5

  • Safe to merge with awareness of two exit-code issues that may mask failures in certain scenarios for internal PRs.
  • The core logic (change detection, sharding, composite actions) is sound and well-structured. However, two exit-code propagation bugs mean that timeout-only failures and any non-conftest pytest failures may silently appear green on individual test jobs. Downstream XML-based checks (combine-results + dorny/test-reporter) serve as a safety net, but the per-job status shown to developers will be misleading.
  • tools/conftest.py (timeout exit-code logic) and .github/actions/run-tests/action.yml (pytest exit-code swallowed by || echo).

Important Files Changed

Filename Overview
.github/workflows/build.yaml Core workflow file: adds detect-changes job for selective test gating, splits general tests into per-package jobs with 3-shard parallelism, and introduces combine-results aggregation. Minor issue: actions/checkout@v6 is used here but @v4 is used in daily-compatibility.yml, creating a version inconsistency.
.github/actions/run-package-tests/action.yml New reusable composite action wrapping ECR pull, docker-based test run, artifact upload, and fork-PR failure check. Uses upload-artifact@v7 while the consuming combine-results job uses download-artifact@v8.
.github/actions/run-tests/action.yml Composite action running pytest inside Docker with support for sharding, filtering, curobo/quarantine modes, and container cleanup on cancellation. The `
.github/actions/run-tests/junit_summary.py New script parsing a JUnit XML file and emitting a Markdown step summary with collapsible sections for passed/failed/skipped tests. Logic is correct; f-strings are properly prefixed.
tools/conftest.py Adds file-level sharding via TEST_SHARD_INDEX/TEST_SHARD_COUNT env vars and fixes the exclude_pattern comma-split bug. Contains a logic bug: pytest.exit uses only num_failing to decide the exit code, ignoring num_timeout, so timeout-only failures exit with code 0.
tools/test_settings.py Splits CUROBO_TESTS into CUROBO_PLANNER_TESTS and SKILLGEN_TESTS, adds QUARANTINED_TESTS list, and adds several new per-test timeout entries. Clean and straightforward.
source/isaaclab_tasks/test/benchmarking/conftest.py Adds pytest_collection_modifyitems hook for item-level sharding in the benchmarking conftest, mirroring the file-level sharding in tools/conftest.py. Clean implementation.
.github/actions/ecr-build-push-pull/action.yml Cosmetic-only change: swaps several diagnostic log emojis from 🟢 to 🔵 to disambiguate informational messages from actual success signals.
.github/workflows/daily-compatibility.yml Pins dorny/test-reporter to a specific commit SHA and adds a combine-compat-results aggregation job plus a Slack-style compatibility report. Still uses actions/checkout@v4 while build.yaml now uses @v6.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    PR([Pull Request / workflow_dispatch]) --> DC

    DC[detect-changes\nproduces per-package true/false outputs]

    DC --> BUILD[build\nBase Docker Image]
    DC -->|mimic/tasks/core/infra| BUILDCUROBO[build-curobo\ncuRobo Docker Image]

    BUILD -->|always + core/infra| CORE1[test-isaaclab-core 1/3]
    BUILD -->|always + core/infra| CORE2[test-isaaclab-core 2/3]
    BUILD -->|always + core/infra| CORE3[test-isaaclab-core 3/3]

    BUILD -->|always + tasks/core/infra| TASKS1[test-isaaclab-tasks 1/3]
    BUILD -->|always + tasks/core/infra| TASKS2[test-isaaclab-tasks 2/3]
    BUILD -->|always + tasks/core/infra| TASKS3[test-isaaclab-tasks 3/3]

    BUILD -->|always + rl/core/infra| RL[test-isaaclab-rl]
    BUILD -->|always + mimic/core/infra| MIMIC[test-isaaclab-mimic]
    BUILD -->|always + contrib/core/infra| CONTRIB[test-isaaclab-contrib]
    BUILD -->|always + teleop/core/infra| TELEOP[test-isaaclab-teleop]
    BUILD -->|always + viz/core/infra| VIZ[test-isaaclab-visualizers]
    BUILD -->|always + assets/core/infra| ASSETS[test-isaaclab-assets]
    BUILD -->|always + newton/core/infra| NEWTON[test-isaaclab-newton]
    BUILD -->|always + physx/core/infra| PHYSX[test-isaaclab-physx]
    BUILD -->|always + tasks/core/infra| TRAIN[test-environments-training]

    BUILDCUROBO -->|always + mimic/core/infra| CUROBO[test-curobo]
    BUILDCUROBO -->|always + mimic/tasks/core/infra| SKILLGEN[test-skillgen]

    CORE1 & CORE2 & CORE3 & TASKS1 & TASKS2 & TASKS3 & RL & MIMIC & CONTRIB & TELEOP & VIZ & ASSETS & NEWTON & PHYSX & TRAIN & CUROBO & SKILLGEN --> COMBINE

    COMBINE[combine-results\ndownload artifacts\npublish unit-test report]
Loading

Comments Outside Diff (2)

  1. tools/conftest.py, line 493 (link)

    Timeout exits silently succeed

    num_failing only counts tests whose result == "FAILED", but timed-out tests are assigned result == "TIMEOUT" (see line 187). If every test in a shard times out, num_failing is 0 and pytest.exit returns returncode=0, telling the Docker runner that all tests passed. While the XML reports do encode the timeouts as errors="1" (picked up later by the fork-PR grep check and dorny/test-reporter), the per-job Docker exit code is wrong, so internal-PR test jobs show green when they should show failure.

  2. .github/actions/run-tests/action.yml, line 184-185 (link)

    Pytest exit-code swallowed by || echo

    /isaac-sim/python.sh -m pytest ... || echo 'Pytest completed with exit code: $?'

    The || echo ... clause always succeeds, so the entire command in the -c "..." heredoc always exits 0 regardless of whether pytest found failures. DOCKER_EXIT (line 185) will therefore always be 0. For test jobs that do not use the outer tools/conftest.py runner (i.e. when pytest is invoked directly rather than via the conftest session hook), any real pytest failure would be silently swallowed here.

    Consider removing the || echo clause so that a failing pytest propagates a non-zero exit code through docker to DOCKER_EXIT:

    If you need the exit code for logging purposes, capture it explicitly:

    /isaac-sim/python.sh -m pytest ... --junitxml=tests/$result_file; EXIT=$?; echo "Pytest completed with exit code: $EXIT"; exit $EXIT

Last reviewed commit: "Try one more cancell..."

@github-actions github-actions bot added the isaac-lab Related to Isaac Lab team label Mar 19, 2026
@myurasov-nv
Copy link
Collaborator Author

@greptile-apps

@myurasov-nv
Copy link
Collaborator Author

@greptile-apps

2 similar comments
@myurasov-nv
Copy link
Collaborator Author

@greptile-apps

@myurasov-nv
Copy link
Collaborator Author

@greptile-apps

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 20, 2026
myurasov-nv and others added 27 commits March 19, 2026 19:19
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: myurasov-nv <168484206+myurasov-nv@users.noreply.github.com>
@myurasov-nv myurasov-nv requested a review from ooctipus March 20, 2026 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation infrastructure isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant