-
Notifications
You must be signed in to change notification settings - Fork 359
[Bugfix] Fix regression test to use installed package instead of source directory #1550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughRefactors perf-regression tooling: moves multi-driver runner into maint/scripts, removes a legacy perf script, simplifies tilelang.testing perf_regression API/exports, and updates GitHub Actions to add a separate permissions-check job that pr-regression depends on. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Commenter as PR Commenter
participant GitHub as GitHub PR
participant Actions as GitHub Actions Workflow
participant PermCheck as permissions-check job
participant PRReg as pr-regression job
Note over Commenter,GitHub: A comment triggers the workflow
Commenter->>GitHub: Add comment / trigger
GitHub->>Actions: Start pr-regression workflow
Actions->>PermCheck: Run permissions-check (GitHub Script)
PermCheck-->>Actions: Emit permission result (output)
Note right of Actions: permission output stored in job outputs
Actions->>PRReg: pr-regression starts with needs:[permissions-check]
PRReg->>PRReg: Run regression steps (uses emitted permission)
PRReg-->>GitHub: Post "Performance Regression Test Report" comment/artifact
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-10-10T13:29:29.347ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.github/workflows/pr-regression-test-bot.ymlmaint/scripts/performance.pymaint/scripts/regression_all.pymaint/scripts/test_perf_regression.pytilelang/testing/__init__.pytilelang/testing/perf_regression.py
💤 Files with no reviewable changes (2)
- tilelang/testing/perf_regression.py
- maint/scripts/performance.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/pr-regression-test-bot.yml
🧬 Code graph analysis (2)
maint/scripts/regression_all.py (1)
tilelang/testing/perf_regression.py (1)
PerfResult(13-15)
tilelang/testing/__init__.py (1)
tilelang/testing/perf_regression.py (2)
process_func(49-66)regression(69-88)
🪛 Ruff (0.14.10)
maint/scripts/regression_all.py
12-12: Do not catch blind exception: Exception
(BLE001)
19-19: Unused function argument: kwargs
(ARG001)
88-88: Avoid specifying long messages outside the exception class
(TRY003)
92-92: Avoid specifying long messages outside the exception class
(TRY003)
98-98: subprocess call: check for execution of untrusted input
(S603)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (14)
maint/scripts/test_perf_regression.py (2)
10-10: LGTM!The
Pathimport is correctly added to support the file path construction for the regression script.
184-187: LGTM!The change correctly executes
regression_all.pyas a script file rather than importing from the installed package. This ensures consistent behavior between OLD and NEW Python interpreters by using the same script path from the repository..github/workflows/pr-regression-test-bot.yml (4)
34-52: LGTM! Authorization gate properly implemented.The
permissions-checkjob correctly verifies collaborator permissions before allowing the expensive regression test to run. The check foradmin,maintain, orwritepermissions is appropriate for protecting self-hosted runner resources.
60-60: LGTM!The
needs: [permissions-check]dependency correctly ensures authorization is verified before the performance regression job runs.
192-192: Critical fix for ensuring installed package usage.Removing
tilelangsource directory andbuildartifacts after installing the baseline version is essential. This prevents Python from inadvertently importing from the source directory instead of the installed package, which is the core issue this PR addresses.
253-254: LGTM!The title accurately reflects the purpose of the workflow as a regression test rather than a general benchmark.
tilelang/testing/__init__.py (1)
11-11: LGTM!The import correctly removes
regression_allas it has been relocated tomaint/scripts/regression_all.pyas a standalone script, reducing the public API surface of the testing module.maint/scripts/regression_all.py (7)
10-20: LGTM on optional dependency handling.The try/except pattern for optional dependencies (
tabulate,tqdm) is appropriate for a standalone script. Thetqdmfallback correctly returns the iterable unchanged.
23-31: Intentional duplication for standalone script.The
PerfResultdataclass duplicates the one intilelang/testing/perf_regression.py, which is appropriate since this script must be self-contained and cannot import from the installed package (that's the whole point of the fix). Thefrozen=Trueaddition is a nice improvement for immutability.
34-63: LGTM!The parsing logic is well-designed: it prioritizes the structured JSON marker (searching from the end for efficiency) and falls back to best-effort text parsing. Error handling for malformed values is appropriate.
71-77: LGTM!File discovery is correctly implemented with proper filtering for actual files and exclusion of
__init__.py.
97-120: LGTM on subprocess execution logic.The subprocess calls execute benchmark files discovered from the repository's
examplesdirectory usingsys.executable. The S603 static analysis warning about untrusted input is a false positive here since the paths come fromrglobon a known repository directory, not user input.The "first writer wins" deduplication strategy and partial failure handling are well-designed for robustness.
132-145: LGTM!The output formatting is flexible with JSON and text modes, and gracefully falls back to a simple ASCII table when
tabulateis unavailable.
148-149: LGTM!Standard entry point pattern for script execution.
| def _examples_root() -> Path: | ||
| # repo_root/tilelang/testing/perf_regression.py -> repo_root | ||
| return Path(__file__).resolve().parents[2] / "examples" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stale comment referencing old file location.
The comment mentions repo_root/tilelang/testing/perf_regression.py but this file is at maint/scripts/regression_all.py. The path calculation itself is correct (parents[2] navigates from maint/scripts/regression_all.py → maint/scripts → maint → repo_root).
🔎 Proposed fix
def _examples_root() -> Path:
- # repo_root/tilelang/testing/perf_regression.py -> repo_root
+ # maint/scripts/regression_all.py -> repo_root
return Path(__file__).resolve().parents[2] / "examples"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _examples_root() -> Path: | |
| # repo_root/tilelang/testing/perf_regression.py -> repo_root | |
| return Path(__file__).resolve().parents[2] / "examples" | |
| def _examples_root() -> Path: | |
| # maint/scripts/regression_all.py -> repo_root | |
| return Path(__file__).resolve().parents[2] / "examples" |
🤖 Prompt for AI Agents
In maint/scripts/regression_all.py around lines 66 to 68, the inline comment
incorrectly references repo_root/tilelang/testing/perf_regression.py even though
the code is in maint/scripts/regression_all.py; update the comment to reflect
the actual file path used for the parents traversal (e.g.,
"repo_root/maint/scripts/regression_all.py -> repo_root") or remove the stale
comment entirely so it matches the code's current location and intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
.github/workflows/pr-regression-test-bot.yml (2)
181-196: Workflow must checkout back to PR branch before running tests.The workflow checks out
mainat line 185 but never switches back to the PR branch. At line 230, the test script./maint/scripts/test_perf_regression.pyruns from the main branch context.This is a critical bug: The PR refactors the test infrastructure by introducing a new
maint/scripts/regression_all.pyscript and modifyingtest_perf_regression.pyto use it. However,regression_all.pydoes not exist in main, and main's version oftest_perf_regression.pystill uses the old inline approach. The refactored test code in the PR is never actually tested.Add
git checkout -after line 191 to return to the PR branch before running tests at line 230.
8-11: Addactions: writepermission to save caches fromsetup-uv.The workflow uses
astral-sh/setup-uv@v7with caching enabled (lines 154–169), but the permissions block (lines 8–11) lacksactions: write. This permission is required forsetup-uvto save caches; without it, cache upload operations will fail and caches will not persist, causing slower CI runs over time as they remain cold.Add
actions: writeto the permissions block:permissions: actions: write contents: read issues: write pull-requests: write
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/pr-regression-test-bot.yml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/pr-regression-test-bot.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (4)
.github/workflows/pr-regression-test-bot.yml (4)
34-53: LGTM! Good security pattern for gating expensive operations.The dedicated permissions-check job properly validates that only users with admin/maintain/write permissions can trigger regression tests. Moving this check into a separate job that gates the expensive pr-regression job is a clean optimization.
60-60: LGTM! Proper job dependency for authorization gate.The
needs: [permissions-check]dependency ensures the expensive regression test only runs after authorization is confirmed.
225-230: LGTM! Activation path correctly updated.The activation of
test_regressionenvironment (line 227) aligns with the new venv created at line 194. The test script correctly usesOLD_PYTHONandNEW_PYTHONto reference the separate installed environments for comparison.
256-266: LGTM! Terminology updated for consistency.The message text now uses "Performance Regression Test Report" instead of "Performance Benchmark Report", which aligns better with the workflow's name and purpose.
|
Would be better for us to enable ccache as we should build from source twice? |
|
likely we will automatically enable ccache, lgtm. |
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.