Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation, Adds Reporting by myurasov-nv · Pull Request #5059 · isaac-sim/IsaacLab

myurasov-nv · 2026-03-18T19:07:31Z

Description

Until now every PR ran all GPU test jobs regardless of what changed. That is no-good and needs to change. :)

This PR:

Splits tests more evenly and according to functionality
Only runs relevant ones; manually all can still be executed
Adds reporting of the passed/failed tests on the Workflow page
Improves job cancellation so resources are freed faster

Detailed changes

Added a detect-changes job that runs git diff --name-only to produce per-package true/false outputs; each test job skips itself when its package is untouched. Core/infra changes still trigger everything. Manual "run all" option available via GH UI.
Split test-general into per-package jobs: test-isaaclab (2 shards), test-isaaclab-rl, -mimic, -contrib, -teleop, -visualizers, -assets. Each job is independently gated and sharded.
Replaced hardcoded include-files whitelists with deterministic shard-index % shard-count sharding, fixing 7 silently skipped isaaclab_tasks tests.
Broke out cuRobo Planner and SkillGen into separate test jobs with their own conditional execution gates.
Fixed conftest.py exclude pattern that was matching a comma-separated string as one substring (never matched).
Container is now killed on cancellation signals so jobs stop quickly via GH UI.
JUnit results are converted to Markdown and displayed on the GH Actions job page, including skip reasons.
Faster code checkout and updated actions/checkout@v4, actions/upload-artifact@v4, actions/download-artifact@v4 to latest SHA-pinned versions.
Extracted reusable composite action for repeating Docker test-run boilerplate.
Added Docker image pull timing for diagnostics.
Suppressed Node.js deprecation warnings in CI output.

How test_train_environments.py is split

The test function test_train_environments is parametrized by task_spec (all registered Isaac environments) and workflow (rl_games, rsl_rl, sb3, skrl), so pytest expands it into N_tasks x 4_workflows individual test items. The pytest_collection_modifyitems hook in conftest.py runs after pytest builds this full list and keeps only every Nth item based on TEST_SHARD_INDEX / TEST_SHARD_COUNT env vars (e.g. shard 0/3 gets items 0, 3, 6, ...). Each CI runner sees ~1/3 of the parametrized combinations, cutting wall time from ~1.2h to ~24min per shard.

Type of change

New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

greptile-apps · 2026-03-19T21:48:01Z

Greptile Summary

This PR is a significant CI/CD overhaul for IsaacLab that replaces a single monolithic "run everything" test workflow with a smart, selective system: a detect-changes job computes per-package change flags, and each downstream test job gates itself on the relevant flag. Tests are parallelised across 3 shards per package, cuRobo/SkillGen tests get their own dedicated jobs, and JUnit XML results are surfaced as Markdown summaries on the GitHub Actions job page. The changes also extract a reusable run-package-tests composite action, fix a long-standing conftest.py bug where comma-separated exclude patterns were matched as a single substring, and improve job cancellation by explicitly killing the Docker container.

Key findings:

Timeout exit-code bug (tools/conftest.py:493) — pytest.exit only checks num_failing for its exit code; timed-out tests have result == "TIMEOUT" (not "FAILED"), so an all-timeout run still exits 0, making the individual test job appear green. The XML error count is correct and is caught by downstream checks for fork PRs, but internal-PR jobs will not surface the failure directly.
Pytest exit-code swallowed in Docker (run-tests/action.yml:184) — The || echo 'Pytest completed...' clause always succeeds, meaning DOCKER_EXIT is always 0. For paths where tools/conftest.py is not the test runner, real pytest failures will not propagate to the job exit code; only the XML report catches them.
actions/checkout version skew — build.yaml uses actions/checkout@v6 throughout while daily-compatibility.yml stays on @v4, creating an inconsistency that may cause confusion during future maintenance.

Confidence Score: 3/5

Safe to merge with awareness of two exit-code issues that may mask failures in certain scenarios for internal PRs.
The core logic (change detection, sharding, composite actions) is sound and well-structured. However, two exit-code propagation bugs mean that timeout-only failures and any non-conftest pytest failures may silently appear green on individual test jobs. Downstream XML-based checks (combine-results + dorny/test-reporter) serve as a safety net, but the per-job status shown to developers will be misleading.
tools/conftest.py (timeout exit-code logic) and .github/actions/run-tests/action.yml (pytest exit-code swallowed by || echo).

Important Files Changed

Filename	Overview
.github/workflows/build.yaml	Core workflow file: adds `detect-changes` job for selective test gating, splits general tests into per-package jobs with 3-shard parallelism, and introduces `combine-results` aggregation. Minor issue: `actions/checkout@v6` is used here but `@v4` is used in `daily-compatibility.yml`, creating a version inconsistency.
.github/actions/run-package-tests/action.yml	New reusable composite action wrapping ECR pull, docker-based test run, artifact upload, and fork-PR failure check. Uses `upload-artifact@v7` while the consuming `combine-results` job uses `download-artifact@v8`.
.github/actions/run-tests/action.yml	Composite action running pytest inside Docker with support for sharding, filtering, curobo/quarantine modes, and container cleanup on cancellation. The `
.github/actions/run-tests/junit_summary.py	New script parsing a JUnit XML file and emitting a Markdown step summary with collapsible sections for passed/failed/skipped tests. Logic is correct; f-strings are properly prefixed.
tools/conftest.py	Adds file-level sharding via `TEST_SHARD_INDEX`/`TEST_SHARD_COUNT` env vars and fixes the `exclude_pattern` comma-split bug. Contains a logic bug: `pytest.exit` uses only `num_failing` to decide the exit code, ignoring `num_timeout`, so timeout-only failures exit with code 0.
tools/test_settings.py	Splits `CUROBO_TESTS` into `CUROBO_PLANNER_TESTS` and `SKILLGEN_TESTS`, adds `QUARANTINED_TESTS` list, and adds several new per-test timeout entries. Clean and straightforward.
source/isaaclab_tasks/test/benchmarking/conftest.py	Adds `pytest_collection_modifyitems` hook for item-level sharding in the benchmarking conftest, mirroring the file-level sharding in `tools/conftest.py`. Clean implementation.
.github/actions/ecr-build-push-pull/action.yml	Cosmetic-only change: swaps several diagnostic log emojis from 🟢 to 🔵 to disambiguate informational messages from actual success signals.
.github/workflows/daily-compatibility.yml	Pins `dorny/test-reporter` to a specific commit SHA and adds a `combine-compat-results` aggregation job plus a Slack-style compatibility report. Still uses `actions/checkout@v4` while `build.yaml` now uses `@v6`.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    PR([Pull Request / workflow_dispatch]) --> DC

    DC[detect-changes\nproduces per-package true/false outputs]

    DC --> BUILD[build\nBase Docker Image]
    DC -->|mimic/tasks/core/infra| BUILDCUROBO[build-curobo\ncuRobo Docker Image]

    BUILD -->|always + core/infra| CORE1[test-isaaclab-core 1/3]
    BUILD -->|always + core/infra| CORE2[test-isaaclab-core 2/3]
    BUILD -->|always + core/infra| CORE3[test-isaaclab-core 3/3]

    BUILD -->|always + tasks/core/infra| TASKS1[test-isaaclab-tasks 1/3]
    BUILD -->|always + tasks/core/infra| TASKS2[test-isaaclab-tasks 2/3]
    BUILD -->|always + tasks/core/infra| TASKS3[test-isaaclab-tasks 3/3]

    BUILD -->|always + rl/core/infra| RL[test-isaaclab-rl]
    BUILD -->|always + mimic/core/infra| MIMIC[test-isaaclab-mimic]
    BUILD -->|always + contrib/core/infra| CONTRIB[test-isaaclab-contrib]
    BUILD -->|always + teleop/core/infra| TELEOP[test-isaaclab-teleop]
    BUILD -->|always + viz/core/infra| VIZ[test-isaaclab-visualizers]
    BUILD -->|always + assets/core/infra| ASSETS[test-isaaclab-assets]
    BUILD -->|always + newton/core/infra| NEWTON[test-isaaclab-newton]
    BUILD -->|always + physx/core/infra| PHYSX[test-isaaclab-physx]
    BUILD -->|always + tasks/core/infra| TRAIN[test-environments-training]

    BUILDCUROBO -->|always + mimic/core/infra| CUROBO[test-curobo]
    BUILDCUROBO -->|always + mimic/tasks/core/infra| SKILLGEN[test-skillgen]

    CORE1 & CORE2 & CORE3 & TASKS1 & TASKS2 & TASKS3 & RL & MIMIC & CONTRIB & TELEOP & VIZ & ASSETS & NEWTON & PHYSX & TRAIN & CUROBO & SKILLGEN --> COMBINE

    COMBINE[combine-results\ndownload artifacts\npublish unit-test report]

Comments Outside Diff (2)

tools/conftest.py, line 493 (link)

Timeout exits silently succeed

num_failing only counts tests whose result == "FAILED", but timed-out tests are assigned result == "TIMEOUT" (see line 187). If every test in a shard times out, num_failing is 0 and pytest.exit returns returncode=0, telling the Docker runner that all tests passed. While the XML reports do encode the timeouts as errors="1" (picked up later by the fork-PR grep check and dorny/test-reporter), the per-job Docker exit code is wrong, so internal-PR test jobs show green when they should show failure.
.github/actions/run-tests/action.yml, line 184-185 (link)

Pytest exit-code swallowed by || echo
```
/isaac-sim/python.sh -m pytest ... || echo 'Pytest completed with exit code: $?'
```
The || echo ... clause always succeeds, so the entire command in the -c "..." heredoc always exits 0 regardless of whether pytest found failures. DOCKER_EXIT (line 185) will therefore always be 0. For test jobs that do not use the outer tools/conftest.py runner (i.e. when pytest is invoked directly rather than via the conftest session hook), any real pytest failure would be silently swallowed here.

Consider removing the || echo clause so that a failing pytest propagates a non-zero exit code through docker to DOCKER_EXIT:

If you need the exit code for logging purposes, capture it explicitly:
```
/isaac-sim/python.sh -m pytest ... --junitxml=tests/$result_file; EXIT=$?; echo "Pytest completed with exit code: $EXIT"; exit $EXIT
```

_{Last reviewed commit: "Try one more cancell..."}

.github/actions/run-package-tests/action.yml

tools/conftest.py

.github/workflows/build.yaml

myurasov-nv · 2026-03-19T23:15:29Z

@greptile-apps

.github/actions/run-tests/junit_summary.py

.github/workflows/build.yaml

myurasov-nv · 2026-03-19T23:43:06Z

@greptile-apps

myurasov-nv · 2026-03-19T23:58:20Z

@greptile-apps

myurasov-nv · 2026-03-20T00:53:24Z

@greptile-apps

…d update related test jobs

…ier debugging

…kflow

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: myurasov-nv <168484206+myurasov-nv@users.noreply.github.com>

github-actions bot added the infrastructure label Mar 18, 2026

myurasov-nv changed the title ~~Splits Tests in More Even Portions, Only Runs Relevant Ones~~ [POC] Splits Tests in More Even Portions, Only Runs Relevant Ones Mar 18, 2026

myurasov-nv changed the title ~~[POC] Splits Tests in More Even Portions, Only Runs Relevant Ones~~ [POC] Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation Mar 18, 2026

myurasov-nv force-pushed the my-gated-tests-1 branch from 5ff7ea8 to 349d522 Compare March 19, 2026 18:35

myurasov-nv changed the title ~~[POC] Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation~~ Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation, Adds Reporting Mar 19, 2026

myurasov-nv marked this pull request as ready for review March 19, 2026 21:42

myurasov-nv requested review from hhansen-bdai and kellyguo11 as code owners March 19, 2026 21:42

greptile-apps bot reviewed Mar 19, 2026

View reviewed changes

.github/actions/run-package-tests/action.yml Show resolved Hide resolved

.github/actions/run-package-tests/action.yml Outdated Show resolved Hide resolved

tools/conftest.py Outdated Show resolved Hide resolved

.github/workflows/build.yaml Show resolved Hide resolved

github-actions bot added the isaac-lab Related to Isaac Lab team label Mar 19, 2026

myurasov-nv force-pushed the my-gated-tests-1 branch from 3f69cf3 to b83016e Compare March 19, 2026 23:13

myurasov-nv force-pushed the my-gated-tests-1 branch from 7d2c6ec to fdec5fc Compare March 19, 2026 23:22

greptile-apps bot reviewed Mar 19, 2026

View reviewed changes

.github/actions/run-tests/junit_summary.py Outdated Show resolved Hide resolved

.github/workflows/build.yaml Show resolved Hide resolved

myurasov-nv force-pushed the my-gated-tests-1 branch from daaf42a to 6b0903b Compare March 20, 2026 02:18

myurasov-nv requested review from Mayankm96 and jtigue-bdai as code owners March 20, 2026 02:18

github-actions bot added the documentation Improvements or additions to documentation label Mar 20, 2026

myurasov-nv added 8 commits March 19, 2026 19:19

Trigger post-merge CI

6534508

Add detect-changes job for path-based selective test gating in CI

0f5b3fc

Split test-general into per-package jobs with sharding and exclude fix

fb23107

Replace tasks include-files whitelists with sharding to fix coverage gap

85d5638

Faster code checkout

5ea18ab

Fix git diff command to correctly identify changed files in PR

b6c7ce3

Show changed files in detect-changes job

48c5ed5

Rename 'isaaclab' outputs to 'isaaclab-core' in detect-changes job an…

e997596

…d update related test jobs

myurasov-nv and others added 27 commits March 19, 2026 19:19

Try splitting test_environments_training.py

24dae95

Add watchdog process to kill tests more reliably

a6b66c6

Skip combine results job on cancellation

1d64987

Update test job gating summary

0f08abe

Comment out the quarantined tests job until needed

4ca1b39

Display package to which every source file changed belongs to for eas…

a1c88d6

…ier debugging

Enhance output formatting for changed files in build workflow

d12952f

Refactor changed files output in build workflow for improved readability

50f8eb9

Improve output formatting for changed files and packages in build wor…

d78cf34

…kflow

UPPERCASE list headers :)

9c13a8e

Show detected changes on GH UI

4d309b8

Remove unnecessary output for triggered packages in build workflow

031ec18

Update job summary header for changed files and packages in build wor…

d6d7e4c

…kflow

Add forgottten f""

947ecf0

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: myurasov-nv <168484206+myurasov-nv@users.noreply.github.com>

Update job names for environments training in build workflow

54a86b7

Update job names for environments training to use snake_case format

74c95e6

Update job names in build workflow to use brackets for shard notation

43f2938

Make build dependent on change detection

d5d5233

Make combine results dependent only on tests

23c1f18

Pin test reporter action to existing tag

0700e1c

Pin test reporter action to existing tag

912eb4a

Fix results collection for envs training

0677310

Cleanup err messages

8c2d20b

Update Docker pull summary message for clarity

c91d178

Un-shard test_environments_training.py

4d75017

Try one more cancellation method

16a4042

Fix tables, again

6209aaa

myurasov-nv force-pushed the my-gated-tests-1 branch from 6b0903b to 6209aaa Compare March 20, 2026 02:19

Increase max length for sanitized messages in JUnit summary

31947bd

myurasov-nv requested a review from ooctipus March 20, 2026 03:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation, Adds Reporting#5059

Splits Tests More Evenly, Only Runs Relevant Ones, Improves Job Cancellation, Adds Reporting#5059
myurasov-nv wants to merge 68 commits intoisaac-sim:developfrom
myurasov-nv:my-gated-tests-1

myurasov-nv commented Mar 18, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 19, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

myurasov-nv commented Mar 19, 2026

Uh oh!

Uh oh!

Uh oh!

myurasov-nv commented Mar 19, 2026

Uh oh!

myurasov-nv commented Mar 19, 2026

Uh oh!

myurasov-nv commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

myurasov-nv commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Detailed changes

How test_train_environments.py is split

Type of change

Checklist

Uh oh!

greptile-apps bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (2)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

myurasov-nv commented Mar 19, 2026

Uh oh!

Uh oh!

Uh oh!

myurasov-nv commented Mar 19, 2026

Uh oh!

myurasov-nv commented Mar 19, 2026

Uh oh!

myurasov-nv commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

myurasov-nv commented Mar 18, 2026 •

edited

Loading

greptile-apps bot commented Mar 19, 2026 •

edited

Loading