[Bugfix] Fix regression test to use installed package instead of source directory #1550

xwhzz · 2025-12-28T03:49:07Z

Summary by CodeRabbit

Chores
- Reorganized CI regression workflow to add a dedicated permission-check step and streamline subsequent job flow.
- Renamed report messaging to "Performance Regression Test Report" for clarity.
- Improved CI cleanup to remove build/artifact directories during baseline setup.
- Simplified and consolidated local performance/regression tooling, replacing older scripts with a unified regression runner and tightening test invocation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ce directory

github-actions · 2025-12-28T03:49:17Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2025-12-28T03:49:17Z

📝 Walkthrough

Walkthrough

Refactors perf-regression tooling: moves multi-driver runner into maint/scripts, removes a legacy perf script, simplifies tilelang.testing perf_regression API/exports, and updates GitHub Actions to add a separate permissions-check job that pr-regression depends on.

Changes

Cohort / File(s)	Summary
GitHub Actions workflow `\.github/workflows/pr-regression-test-bot.yml`	Added `permissions-check` job that determines commenter permission via GitHub Script; made `pr-regression` depend on it (`needs: [permissions-check]`); removed inline permission-check steps; updated PR comment title to "Performance Regression Test Report"; added artifact cleanup (`rm -rf tilelang build`); switched environment/path usage for regression tests.
Removed legacy perf script `maint/scripts/performance.py`	Deleted file containing autotuned matmul benchmark, kernel definition, tuner config, and CLI entry point.
New multi-driver runner (maintenance) `maint/scripts/regression_all.py`	New runner: `PerfResult` dataclass, discovery of `regression_*.py` drivers, subprocess-based execution, parsing of JSON/text latency markers, aggregated results, optional JSON/tabular output, and CLI entry point.
Test harness update `maint/scripts/test_perf_regression.py`	Switched from inline `python -c` invocations to using a deterministic `Path` to `regression_all.py`; added `pathlib.Path` usage; minor callsite formatting changes.
Testing public API changes `tilelang/testing/__init__.py`	Removed `regression_all` export; added exports for compute-version decorators: `requires_cuda_compute_version_ge/gt/eq/lt/le`.
Simplified perf regression helper `tilelang/testing/perf_regression.py`	Removed multi-driver orchestration (deleted `_parse_table`, `_examples_root`, `_discover_bench_files`, `regression_all`); eliminated external deps (tabulate, tqdm) and subprocess scaffolding; retained core latency collection and single-result JSON/text emission (`_emit_results`).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Commenter as PR Commenter
  participant GitHub as GitHub PR
  participant Actions as GitHub Actions Workflow
  participant PermCheck as permissions-check job
  participant PRReg as pr-regression job

  Note over Commenter,GitHub: A comment triggers the workflow
  Commenter->>GitHub: Add comment / trigger
  GitHub->>Actions: Start pr-regression workflow
  Actions->>PermCheck: Run permissions-check (GitHub Script)
  PermCheck-->>Actions: Emit permission result (output)
  Note right of Actions: permission output stored in job outputs
  Actions->>PRReg: pr-regression starts with needs:[permissions-check]
  PRReg->>PRReg: Run regression steps (uses emitted permission)
  PRReg-->>GitHub: Post "Performance Regression Test Report" comment/artifact

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[Enhancement] Improve GitHub Actions permissions check and refine performance regression testing #1519: Overlaps on GitHub Actions workflow changes and performance-regression tooling refactor (same files affected).

Suggested reviewers

LeiWang1999

Poem

🐰 I hopped through scripts and moved a few,
Removed old kernels, made runners new,
Permissions checked before the test,
Clean reports now stand the rest,
A rabbit's cheer — the infra's best!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: refactoring regression tests to use the installed package (via regression_all.py in maint/scripts/) instead of relying on source directory imports, as evidenced by the removal of regression_all from tilelang/testing, deletion of performance.py, and updates to test infrastructure.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ebd70b and a19329b.

📒 Files selected for processing (1)

.github/workflows/pr-regression-test-bot.yml

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-10T13:29:29.347Z

Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

.github/workflows/pr-regression-test-bot.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)

🔇 Additional comments (4)

.github/workflows/pr-regression-test-bot.yml (4)

34-53: LGTM! Good security pattern for gating expensive operations.

The dedicated permissions-check job properly validates that only users with admin/maintain/write permissions can trigger regression tests. Moving this check into a separate job that gates the expensive pr-regression job is a clean optimization.

60-60: LGTM! Proper job dependency for authorization gate.

The needs: [permissions-check] dependency ensures the expensive regression test only runs after authorization is confirmed.

225-230: LGTM! Activation path correctly updated.

The activation of test_regression environment (line 227) aligns with the new venv created at line 194. The test script correctly uses OLD_PYTHON and NEW_PYTHON to reference the separate installed environments for comparison.

256-266: LGTM! Terminology updated for consistency.

The message text now uses "Performance Regression Test Report" instead of "Performance Benchmark Report", which aligns better with the workflow's name and purpose.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d70cf36 and 7ebd70b.

📒 Files selected for processing (6)

.github/workflows/pr-regression-test-bot.yml
maint/scripts/performance.py
maint/scripts/regression_all.py
maint/scripts/test_perf_regression.py
tilelang/testing/__init__.py
tilelang/testing/perf_regression.py

💤 Files with no reviewable changes (2)

tilelang/testing/perf_regression.py
maint/scripts/performance.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-10T13:29:29.347Z

Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

.github/workflows/pr-regression-test-bot.yml

🧬 Code graph analysis (2)

maint/scripts/regression_all.py (1)

tilelang/testing/perf_regression.py (1)

PerfResult (13-15)

tilelang/testing/__init__.py (1)

tilelang/testing/perf_regression.py (2)

process_func (49-66)

regression (69-88)

🪛 Ruff (0.14.10)

maint/scripts/regression_all.py

12-12: Do not catch blind exception: Exception

(BLE001)

19-19: Unused function argument: kwargs

(ARG001)

88-88: Avoid specifying long messages outside the exception class

(TRY003)

92-92: Avoid specifying long messages outside the exception class

(TRY003)

98-98: subprocess call: check for execution of untrusted input

(S603)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
GitHub Check: Test for Python 3.12 with Metal (on macos-latest)

🔇 Additional comments (14)

maint/scripts/test_perf_regression.py (2)

10-10: LGTM!

The Path import is correctly added to support the file path construction for the regression script.

184-187: LGTM!

The change correctly executes regression_all.py as a script file rather than importing from the installed package. This ensures consistent behavior between OLD and NEW Python interpreters by using the same script path from the repository.

.github/workflows/pr-regression-test-bot.yml (4)

34-52: LGTM! Authorization gate properly implemented.

The permissions-check job correctly verifies collaborator permissions before allowing the expensive regression test to run. The check for admin, maintain, or write permissions is appropriate for protecting self-hosted runner resources.

60-60: LGTM!

The needs: [permissions-check] dependency correctly ensures authorization is verified before the performance regression job runs.

192-192: Critical fix for ensuring installed package usage.

Removing tilelang source directory and build artifacts after installing the baseline version is essential. This prevents Python from inadvertently importing from the source directory instead of the installed package, which is the core issue this PR addresses.

253-254: LGTM!

The title accurately reflects the purpose of the workflow as a regression test rather than a general benchmark.

tilelang/testing/__init__.py (1)

11-11: LGTM!

The import correctly removes regression_all as it has been relocated to maint/scripts/regression_all.py as a standalone script, reducing the public API surface of the testing module.

maint/scripts/regression_all.py (7)

10-20: LGTM on optional dependency handling.

The try/except pattern for optional dependencies (tabulate, tqdm) is appropriate for a standalone script. The tqdm fallback correctly returns the iterable unchanged.

23-31: Intentional duplication for standalone script.

The PerfResult dataclass duplicates the one in tilelang/testing/perf_regression.py, which is appropriate since this script must be self-contained and cannot import from the installed package (that's the whole point of the fix). The frozen=True addition is a nice improvement for immutability.

34-63: LGTM!

The parsing logic is well-designed: it prioritizes the structured JSON marker (searching from the end for efficiency) and falls back to best-effort text parsing. Error handling for malformed values is appropriate.

71-77: LGTM!

File discovery is correctly implemented with proper filtering for actual files and exclusion of __init__.py.

97-120: LGTM on subprocess execution logic.

The subprocess calls execute benchmark files discovered from the repository's examples directory using sys.executable. The S603 static analysis warning about untrusted input is a false positive here since the paths come from rglob on a known repository directory, not user input.

The "first writer wins" deduplication strategy and partial failure handling are well-designed for robustness.

132-145: LGTM!

The output formatting is flexible with JSON and text modes, and gracefully falls back to a simple ASCII table when tabulate is unavailable.

148-149: LGTM!

Standard entry point pattern for script execution.

coderabbitai · 2025-12-28T03:53:25Z

maint/scripts/regression_all.py

+def _examples_root() -> Path:
+    # repo_root/tilelang/testing/perf_regression.py -> repo_root
+    return Path(__file__).resolve().parents[2] / "examples"


⚠️ Potential issue | 🟡 Minor

Stale comment referencing old file location.

The comment mentions repo_root/tilelang/testing/perf_regression.py but this file is at maint/scripts/regression_all.py. The path calculation itself is correct (parents[2] navigates from maint/scripts/regression_all.py → maint/scripts → maint → repo_root).

🔎 Proposed fix

def _examples_root() -> Path: - # repo_root/tilelang/testing/perf_regression.py -> repo_root + # maint/scripts/regression_all.py -> repo_root return Path(__file__).resolve().parents[2] / "examples"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _examples_root() -> Path:

# repo_root/tilelang/testing/perf_regression.py -> repo_root

return Path(__file__).resolve().parents[2] / "examples"

def _examples_root() -> Path:

# maint/scripts/regression_all.py -> repo_root

return Path(__file__).resolve().parents[2] / "examples"

🤖 Prompt for AI Agents

In maint/scripts/regression_all.py around lines 66 to 68, the inline comment incorrectly references repo_root/tilelang/testing/perf_regression.py even though the code is in maint/scripts/regression_all.py; update the comment to reflect the actual file path used for the parents traversal (e.g., "repo_root/maint/scripts/regression_all.py -> repo_root") or remove the stale comment entirely so it matches the code's current location and intent.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

.github/workflows/pr-regression-test-bot.yml (2)
181-196: Workflow must checkout back to PR branch before running tests.

The workflow checks out main at line 185 but never switches back to the PR branch. At line 230, the test script ./maint/scripts/test_perf_regression.py runs from the main branch context.

This is a critical bug: The PR refactors the test infrastructure by introducing a new maint/scripts/regression_all.py script and modifying test_perf_regression.py to use it. However, regression_all.py does not exist in main, and main's version of test_perf_regression.py still uses the old inline approach. The refactored test code in the PR is never actually tested.

Add git checkout - after line 191 to return to the PR branch before running tests at line 230.

8-11: Add actions: write permission to save caches from setup-uv.

The workflow uses astral-sh/setup-uv@v7 with caching enabled (lines 154–169), but the permissions block (lines 8–11) lacks actions: write. This permission is required for setup-uv to save caches; without it, cache upload operations will fail and caches will not persist, causing slower CI runs over time as they remain cold.

Add actions: write to the permissions block:
permissions:
  actions: write
  contents: read
  issues: write
  pull-requests: write

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ebd70b and a19329b.

📒 Files selected for processing (1)

.github/workflows/pr-regression-test-bot.yml

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-10T13:29:29.347Z

Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

.github/workflows/pr-regression-test-bot.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)

🔇 Additional comments (4)

.github/workflows/pr-regression-test-bot.yml (4)

34-53: LGTM! Good security pattern for gating expensive operations.

The dedicated permissions-check job properly validates that only users with admin/maintain/write permissions can trigger regression tests. Moving this check into a separate job that gates the expensive pr-regression job is a clean optimization.

60-60: LGTM! Proper job dependency for authorization gate.

The needs: [permissions-check] dependency ensures the expensive regression test only runs after authorization is confirmed.

225-230: LGTM! Activation path correctly updated.

The activation of test_regression environment (line 227) aligns with the new venv created at line 194. The test script correctly uses OLD_PYTHON and NEW_PYTHON to reference the separate installed environments for comparison.

256-266: LGTM! Terminology updated for consistency.

The message text now uses "Performance Regression Test Report" instead of "Performance Benchmark Report", which aligns better with the workflow's name and purpose.

LeiWang1999 · 2025-12-28T12:00:50Z

Would be better for us to enable ccache as we should build from source twice?

LeiWang1999 · 2025-12-28T12:52:12Z

likely we will automatically enable ccache, lgtm.

xwhzz added 5 commits December 28, 2025 02:37

[Bugfix] Fix regression test to use installed package instead of sour…

42d47e7

…ce directory

fix

332b172

fix

48de691

lint

93633bd

upd

7ebd70b

coderabbitai bot reviewed Dec 28, 2025

View reviewed changes

fix

a19329b

coderabbitai bot reviewed Dec 28, 2025

View reviewed changes

LeiWang1999 merged commit 23ede42 into tile-ai:main Dec 28, 2025
6 checks passed

coderabbitai bot mentioned this pull request Dec 28, 2025

[Script] Provide regression test script to help benchmark regression in local env #1551

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix regression test to use installed package instead of source directory #1550

[Bugfix] Fix regression test to use installed package instead of source directory #1550

xwhzz commented Dec 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Dec 28, 2025

Uh oh!

coderabbitai bot commented Dec 28, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 28, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

LeiWang1999 commented Dec 28, 2025

Uh oh!

LeiWang1999 commented Dec 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Bugfix] Fix regression test to use installed package instead of source directory #1550

[Bugfix] Fix regression test to use installed package instead of source directory #1550

Conversation

xwhzz commented Dec 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Dec 28, 2025

Uh oh!

coderabbitai bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

LeiWang1999 commented Dec 28, 2025

Uh oh!

LeiWang1999 commented Dec 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xwhzz commented Dec 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 28, 2025 •

edited

Loading