Skip to content

Conversation

@sgjzfzzf
Copy link
Contributor

@sgjzfzzf sgjzfzzf commented Dec 20, 2025

This PR resolves #1469. It moves most of the if-else checkers for multi-backends in tilelang/cache/kernel_cache.py, and dispatches the different implementations by inheritance in different subclasses.

We retained some specialized implementations for CuteDSL and considered removing them in the future once this backend is ready.

Summary by CodeRabbit

  • New Features

    • Per-backend on-disk kernel caching adapters added (including CuTeDSL) and a new FP32 CUDA GEMM dispatch path.
  • Changes

    • Centralized cache routing with consistent save/load behavior across backends; execution backend options updated (removed "auto", added "cutedsl"; default now tvm_ffi); improved backend resolution and clearer verbose logging.
  • Tests / CI

    • Added comprehensive disk-cache tests across backends and simplified CI/test runs by removing GEMM‑V1 isolation.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 20, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Refactors kernel caching into a dispatch map of backend-specific KernelCache subclasses, centralizes disk I/O into KernelCache helper methods, adds per-backend cache adapters (tvm_ffi, nvrtc, cutedsl, cython, torch), updates cached(...) routing and signatures, and adapts tests and CI to the new cache flow.

Changes

Cohort / File(s) Summary
Cache dispatcher & public API
\tilelang/cache/init.py``
Replaced singleton with _dispatch_map; cached(...) now resolves target/execution_backend (adds cutedsl), applies env/defaults, logs resolution when verbose, and routes to backend-specific KernelCache or raises on unsupported backend.
Base KernelCache refactor
\tilelang/cache/kernel_cache.py``
Converted module-level path constants into instance attributes; introduced helper methods (_save_kernel_source_code_to_disk, _save_wrapper_kernel_code_to_disk, _save_so_cubin_to_disk, _get_required_files, _load_kernel_source, _set_adapter_cache_path, _build_kernel); consolidated save/load flows and adapter-path handling.
CuTeDSL adapter & cache
\tilelang/jit/adapter/cutedsl/kernel_cache.py`, `tilelang/jit/adapter/cutedsl/adapter.py`, `tilelang/jit/adapter/cutedsl/libgen.py``
Added CuTeDSLKernelCache with launcher-specific artifact paths and overrides; removed tma_* mutators and related assignments; adjusted file reads to use with-open and set kernel_global_source from device source.
TVM FFI adapter
\tilelang/jit/adapter/kernel_cache.py``
Added TVMFFIKernelCache (kernel_lib_path = "executable.so") with _save_wrapper_kernel_code_to_disk and _save_so_cubin_to_disk to persist host source and executable artifacts.
NVRTC adapter
\tilelang/jit/adapter/nvrtc/kernel_cache.py``
Added NVRTCKernelCache (kernel_lib_path="kernel.cubin", kernel_py_path="kernel.py") that writes a Python wrapper (kernel.py) from the .cubin counterpart before saving the binary.
Cython & Torch adapters
\tilelang/jit/adapter/cython/kernel_cache.py`, `tilelang/jit/adapter/torch/kernel_cache.py``
Added minimal CythonKernelCache and TorchKernelCache subclasses inheriting KernelCache (placeholders for backend-specific wiring).
Tests: kernel cache & JIT
\testing/python/cache/test_tilelang_kernel_cache.py`, `testing/python/jit/test_tilelang_jit_cutedsl.py`, `testing/python/jit/test_tilelang_jit_nvrtc.py``
Added disk-cache tests exercising miss/hit/isolation and postproc callbacks across backends; removed GEMM harness/tests for CuTeDSL and NVRTC.
CI & local test scripts
\.github/workflows/ci.yml`, `maint/scripts/run_local_ci_test.sh``
Removed CuTeDSL-specific ignore/isolated steps and unified CUDA JIT test invocations; simplified local CI test script to a single pytest invocation.
CUDA template
\src/tl_templates/cuda/gemm_mma.h``
Added DispatchInstruction specialization for float×float×float MMA (FP32×FP32×FP32) across architectures, enabling a new FP32 dispatch path.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller as Caller
  participant Dispatcher as CacheDispatcher
  participant Backend as Backend KernelCache
  participant Adapter as KernelAdapter
  participant FS as Filesystem

  Caller->>Dispatcher: cached(func, target, execution_backend, ...)
  Dispatcher->>Dispatcher: normalize/resolve target & execution_backend
  Dispatcher->>Backend: select backend-specific KernelCache
  Backend->>Backend: _set_adapter_cache_path(kernel, cache_path)
  Backend->>Backend: _get_required_files(cache_path)
  Backend->>FS: check required files exist
  alt required files missing
    Backend->>Adapter: request sources, wrapper, executable, params
    Adapter-->>Backend: provide artifacts
    Backend->>FS: _save_kernel_source_code_to_disk(...)
    Backend->>FS: _save_wrapper_kernel_code_to_disk(...)
    Backend->>FS: _save_so_cubin_to_disk(...)
  end
  Backend->>Backend: _load_kernel_source(...)
  Backend->>Backend: _build_kernel(...) -> JITKernel | None
  Backend-->>Caller: return JITKernel or error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • LeiWang1999

Poem

"I hopped through code with whiskers bright,
dispatching kernels through the night.
I cached wrappers, launchers, and more,
tucked binaries snug in a cozy store.
🥕🐇"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.42% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Out of Scope Changes check ❓ Inconclusive Most changes support the inheritance refactor, but CI workflow modifications (removing GEMM_V1 tests, deleting test_gemm_f16f16f16_nn functions) and CUDA template changes (gemm_mma.h) are tangential to the core refactoring objective. Clarify whether test simplification and CUDA template enhancements belong in this refactoring PR or should be addressed separately to maintain focused scope.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: refactoring KernelCache to use inheritance-based design with backend-specific subclasses.
Linked Issues check ✅ Passed The PR successfully addresses #1469 objectives by eliminating if-else branching, introducing inheritance-based subclasses (TVMFFIKernelCache, CythonKernelCache, NVRTCKernelCache, TorchKernelCache, CuTeDSLKernelCache), enabling easy backend extensibility, and adding comprehensive cache tests.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d04446d and 44a1655.

📒 Files selected for processing (2)
  • testing/python/jit/test_tilelang_jit_cutedsl.py
  • tilelang/jit/adapter/cutedsl/adapter.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • testing/python/jit/test_tilelang_jit_cutedsl.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-26T06:45:47.669Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1483
File: tilelang/jit/adapter/cutedsl/adapter.py:93-95
Timestamp: 2025-12-26T06:45:47.669Z
Learning: In tilelang/jit/adapter/cutedsl/adapter.py, verify that host_kernel_source and device_kernel_source are equal. If they are not, add an assertion or a guard with a clear comment explaining why they should be identical, and create a test that asserts their equality. If they are intentionally different, document the reason and implement a formal check or normalization to prevent drift.

Applied to files:

  • tilelang/jit/adapter/cutedsl/adapter.py
🧬 Code graph analysis (1)
tilelang/jit/adapter/cutedsl/adapter.py (1)
tilelang/jit/adapter/cython/adapter.py (1)
  • libpath (366-368)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
🔇 Additional comments (1)
tilelang/jit/adapter/cutedsl/adapter.py (1)

43-44: Verify equality of host_kernel_source and device_kernel_source.

Based on learnings, please verify that host_kernel_source and device_kernel_source are equal. If they should always be identical, add an assertion or guard with a clear comment explaining this constraint. If they are intentionally different, document the reason and implement a formal check or normalization to prevent drift.

🔎 Suggested verification
         self.params = params
         self.result_idx = self._legalize_result_idx(result_idx)
         self.host_kernel_source = host_kernel_source
         self.device_kernel_source = device_kernel_source
+        # Verify host and device sources are identical for CuTeDSL
+        # (CuTeDSL uses a unified kernel model where host and device code should match)
+        assert host_kernel_source == device_kernel_source, \
+            "CuTeDSL requires host_kernel_source == device_kernel_source"

Apply the same check in from_database (after line 118).

Additionally, add a test to verify this constraint:

def test_cutedsl_host_device_source_equality():
    # Test that host and device kernel sources are equal for CuTeDSL kernels
    ...

Based on learnings, this verification was previously requested to prevent source drift between host and device code.

Also applies to: 117-118


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@sgjzfzzf
Copy link
Contributor Author

I tested them on CUDA with tvm-ffi, cython, and ctypes, and it works very well(except for one bug that existed in the previous version, which leads to an exception on ctypes) on the environment I have. They may need more tests on other backends.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tilelang/cache/__init__.py (1)

39-39: Type hint is inconsistent with dispatch pool keys.

The Literal type is missing "cutedsl" which exists in _dispatch_pool. Also, None is allowed by the type hint but would cause a KeyError at runtime since there's no None key in the pool.

Suggested fix
-    execution_backend: Literal["auto", "tvm_ffi", "ctypes", "cython", "nvrtc", "torch"] | None = "auto",
+    execution_backend: Literal["auto", "tvm_ffi", "ctypes", "cython", "nvrtc", "cutedsl", "torch"] = "auto",
🧹 Nitpick comments (6)
tilelang/cache/torch_kernel_cache.py (1)

4-6: Superfluous *args, **kwargs in __new__ signature.

The base class KernelCache.__new__(cls) does not accept additional arguments. Since these cache classes are instantiated without arguments in the dispatch pool, the *args, **kwargs parameters are unnecessary and could be removed for clarity.

This applies to all similar subclasses in this PR.

Suggested simplification
 class TorchKernelCache(KernelCache):
-    def __new__(cls, *args, **kwargs):
-        return super().__new__(cls, *args, **kwargs)
+    def __new__(cls):
+        return super().__new__(cls)
tilelang/cache/__init__.py (1)

47-57: Consider adding a guard for invalid backend keys.

If an unsupported execution_backend value is passed, _dispatch_pool[execution_backend] will raise a raw KeyError. A more descriptive error could improve debuggability.

Optional: Add validation with clearer error
+    if execution_backend not in _dispatch_pool:
+        raise ValueError(
+            f"Unsupported execution_backend '{execution_backend}'. "
+            f"Allowed: {list(_dispatch_pool.keys())}"
+        )
     return _dispatch_pool[execution_backend].cached(
tilelang/cache/tvm_ffi_kernel_cache.py (1)

10-11: Remove redundant __new__ method.

The __new__ method simply delegates to the parent without any custom logic. Since the parent KernelCache already implements singleton behavior, this override is unnecessary.

🔎 Proposed fix
-    def __new__(cls, *args, **kwargs):
-        return super().__new__(cls, *args, **kwargs)
-
tilelang/cache/nvrtc_kernel_cache.py (1)

11-12: Remove redundant __new__ method.

The __new__ method simply delegates to the parent without any custom logic, making it unnecessary.

🔎 Proposed fix
-    def __new__(cls, *args, **kwargs):
-        return super().__new__(cls, *args, **kwargs)
-
tilelang/cache/cutedsl_kernel_cache.py (2)

16-17: Remove redundant __new__ method.

The __new__ method simply delegates to the parent without any custom logic, making it unnecessary.

🔎 Proposed fix
-    def __new__(cls, *args, **kwargs):
-        return super().__new__(cls, *args, **kwargs)
-

42-43: Consider using unpacking for list concatenation.

As suggested by the static analysis tool, unpacking provides cleaner syntax for combining lists.

🔎 Proposed refactor using unpacking
     def _get_required_files(self, cache_path: str) -> list[str]:
-        return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)]
+        return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 168aec7 and 334a13d.

📒 Files selected for processing (9)
  • tilelang/cache/__init__.py (2 hunks)
  • tilelang/cache/auto_kernel_cache.py (1 hunks)
  • tilelang/cache/ctypes_kernel_cache.py (1 hunks)
  • tilelang/cache/cutedsl_kernel_cache.py (1 hunks)
  • tilelang/cache/cython_kernel_cache.py (1 hunks)
  • tilelang/cache/kernel_cache.py (4 hunks)
  • tilelang/cache/nvrtc_kernel_cache.py (1 hunks)
  • tilelang/cache/torch_kernel_cache.py (1 hunks)
  • tilelang/cache/tvm_ffi_kernel_cache.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
tilelang/cache/cython_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/cache/torch_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/cache/tvm_ffi_kernel_cache.py (3)
tilelang/cache/kernel_cache.py (4)
  • KernelCache (25-449)
  • _save_wrapper_kernel_code_to_disk (414-418)
  • _safe_write_file (256-263)
  • _save_so_cubin_to_disk (420-425)
tilelang/cache/cutedsl_kernel_cache.py (1)
  • _save_so_cubin_to_disk (24-39)
tilelang/cache/nvrtc_kernel_cache.py (1)
  • _save_so_cubin_to_disk (23-30)
tilelang/cache/cutedsl_kernel_cache.py (3)
tilelang/cache/kernel_cache.py (6)
  • KernelCache (25-449)
  • _save_kernel_source_code_to_disk (407-412)
  • _save_so_cubin_to_disk (420-425)
  • _safe_write_file (256-263)
  • _load_binary (250-253)
  • _get_required_files (427-430)
tilelang/cache/nvrtc_kernel_cache.py (2)
  • _save_kernel_source_code_to_disk (14-21)
  • _save_so_cubin_to_disk (23-30)
tilelang/cache/tvm_ffi_kernel_cache.py (1)
  • _save_so_cubin_to_disk (19-24)
tilelang/cache/auto_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/cache/ctypes_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/cache/kernel_cache.py (6)
tilelang/cache/cutedsl_kernel_cache.py (3)
  • _save_kernel_source_code_to_disk (20-21)
  • _save_so_cubin_to_disk (24-39)
  • _get_required_files (42-43)
tilelang/cache/tvm_ffi_kernel_cache.py (2)
  • _save_wrapper_kernel_code_to_disk (13-17)
  • _save_so_cubin_to_disk (19-24)
tilelang/jit/kernel.py (3)
  • JITKernel (39-794)
  • kernel_source (644-645)
  • get_kernel_source (465-476)
tilelang/jit/adapter/cutedsl/adapter.py (1)
  • get_kernel_source (206-214)
tilelang/jit/adapter/nvrtc/adapter.py (1)
  • get_kernel_source (179-190)
tilelang/jit/adapter/ctypes/adapter.py (2)
  • get_kernel_source (296-302)
  • libpath (282-284)
tilelang/cache/__init__.py (8)
tilelang/cache/auto_kernel_cache.py (1)
  • AutoKernelCache (4-6)
tilelang/cache/ctypes_kernel_cache.py (1)
  • CTypesKernelCache (4-6)
tilelang/cache/cutedsl_kernel_cache.py (1)
  • CuTeDSLKernelCache (10-49)
tilelang/cache/cython_kernel_cache.py (1)
  • CythonKernelCache (4-6)
tilelang/cache/nvrtc_kernel_cache.py (1)
  • NVRTCKernelCache (7-30)
tilelang/cache/torch_kernel_cache.py (1)
  • TorchKernelCache (4-6)
tilelang/cache/tvm_ffi_kernel_cache.py (1)
  • TVMFFIKernelCache (7-24)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/cache/nvrtc_kernel_cache.py (2)
tilelang/cache/kernel_cache.py (4)
  • _save_kernel_source_code_to_disk (407-412)
  • _safe_write_file (256-263)
  • _load_binary (250-253)
  • _save_so_cubin_to_disk (420-425)
tilelang/cache/tvm_ffi_kernel_cache.py (1)
  • _save_so_cubin_to_disk (19-24)
🪛 Ruff (0.14.8)
tilelang/cache/cutedsl_kernel_cache.py

43-43: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
🔇 Additional comments (7)
tilelang/cache/auto_kernel_cache.py (1)

4-6: LGTM, same note as TorchKernelCache.

The same optional simplification of removing *args, **kwargs applies here.

tilelang/cache/ctypes_kernel_cache.py (1)

4-6: LGTM, same note as TorchKernelCache.

The same optional simplification of removing *args, **kwargs applies here.

tilelang/cache/cython_kernel_cache.py (1)

4-6: LGTM, same note as TorchKernelCache.

The same optional simplification of removing *args, **kwargs applies here.

tilelang/cache/__init__.py (2)

22-30: Dispatch pool design looks good for the multi-backend refactor.

The dictionary-based dispatch cleanly replaces the if-else pattern as intended by issue #1469. Each backend gets its own singleton cache instance.


75-76: Verify this module-level behavior is intentional.

If TILELANG_CLEAR_CACHE env var is set, this will raise RuntimeError during import, preventing any use of the module. This seems intentional as a safeguard, but confirm this is the desired UX.

tilelang/cache/kernel_cache.py (2)

39-45: Good refactoring: centralized path configuration.

Moving from module-level path constants to per-class attributes enables backend-specific path customization without modifying the base class. This aligns well with the PR's goal of eliminating conditional logic.


407-449: Well-structured I/O abstraction.

The new helper methods (_save_kernel_source_code_to_disk, _save_wrapper_kernel_code_to_disk, _save_so_cubin_to_disk, _get_required_files, _load_kernel_source) provide clean extension points for subclasses while centralizing common disk I/O patterns. This significantly improves maintainability.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tilelang/cache/__init__.py (1)

47-58: Consider defensive error handling for invalid backend keys.

The direct dictionary lookup on line 48 will raise a KeyError if an invalid backend_key is provided. While the Literal type hint provides compile-time protection, runtime validation would offer better error messages when type checking is bypassed or when backend names are dynamically constructed.

🔎 Proposed enhancement
 backend_key: str = execution_backend if execution_backend is not None else "auto"
+if backend_key not in _dispatch_pool:
+    valid_backends = ", ".join(_dispatch_pool.keys())
+    raise ValueError(
+        f"Invalid execution_backend '{backend_key}'. "
+        f"Valid options are: {valid_backends}"
+    )
 return _dispatch_pool[backend_key].cached(
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 334a13d and 676b4d0.

📒 Files selected for processing (2)
  • examples/gemm/example_gemm.py (1 hunks)
  • tilelang/cache/__init__.py (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

  • examples/gemm/example_gemm.py
📚 Learning: 2025-12-18T04:50:00.512Z
Learnt from: silentCoder-dev
Repo: tile-ai/tilelang PR: 1464
File: testing/python/language/test_tilelang_language_rand.py:14-14
Timestamp: 2025-12-18T04:50:00.512Z
Learning: In `testing/python/language/test_tilelang_language_rand.py`, the TileLang kernel uses `blk_M = M` (single block) and calls `rng_rand()` four times per element to align results with the Triton implementation, which uses `blk_M = 128` (multiple blocks) and calls the RNG once per element. These differences compensate for internal RNG behavior differences between TileLang and Triton.

Applied to files:

  • examples/gemm/example_gemm.py
🧬 Code graph analysis (1)
examples/gemm/example_gemm.py (1)
tilelang/jit/kernel.py (1)
  • out_idx (636-637)
🔇 Additional comments (2)
tilelang/cache/__init__.py (1)

5-20: Clean TYPE_CHECKING usage to avoid circular imports.

The use of TYPE_CHECKING to import KernelCache only for type hints is a good pattern that avoids potential circular import issues at runtime while maintaining type safety.

examples/gemm/example_gemm.py (1)

5-5: Good example demonstrating the new backend selection API.

The addition of execution_backend="ctypes" effectively demonstrates the new multi-backend dispatch mechanism. Based on the PR description, this has been tested successfully on CUDA.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tilelang/cache/cutedsl_kernel_cache.py (2)

16-17: Remove unnecessary __new__ override.

The __new__ method simply delegates to the parent without adding any custom logic. Python automatically uses the parent's __new__ if not overridden, making this dead code.

🔎 Proposed fix
-    def __new__(cls, *args, **kwargs):
-        return super().__new__(cls, *args, **kwargs)
-

41-43: Consider using iterable unpacking for list extension.

The list concatenation works correctly, but iterable unpacking is more idiomatic and slightly more efficient.

🔎 Proposed fix
     @override
     def _get_required_files(self, cache_path: str) -> list[str]:
-        return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)]
+        return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 676b4d0 and e28dba1.

📒 Files selected for processing (1)
  • tilelang/cache/cutedsl_kernel_cache.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/cache/cutedsl_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (7)
  • KernelCache (25-449)
  • _save_kernel_source_code_to_disk (407-412)
  • _save_so_cubin_to_disk (420-425)
  • _safe_write_file (256-263)
  • _load_binary (250-253)
  • _get_required_files (427-430)
  • _load_kernel_source (432-449)
🪛 Ruff (0.14.8)
tilelang/cache/cutedsl_kernel_cache.py

43-43: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (3)
tilelang/cache/cutedsl_kernel_cache.py (3)

19-21: Verify no-op implementation is intentional.

The method intentionally skips saving kernel source code, deviating from the base class behavior. Confirm this is the correct design for CuTeDSL—does the backend not generate saveable kernel source, or is source code unnecessary for cache restoration?


23-39: LGTM! Launcher artifact handling is well-implemented.

The method correctly saves CuTeDSL-specific launcher artifacts (library and optional C++ source) using defensive attribute checks and the parent class's safe write utilities.


45-47: Verify empty string return values are correct.

The method returns empty strings rather than None. The base class can return None to indicate loading failure. Since _save_kernel_source_code_to_disk is a no-op and no source is saved, confirm that returning empty strings (rather than None) is the expected behavior for downstream code.

Note: The previous review issue regarding method naming (_load_kernel_from_cache vs _load_kernel_source) has been correctly resolved.

Copy link
Member

@LeiWang1999 LeiWang1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, would it be possible for us to move cache into adapter class itself in https://github.com/tile-ai/tilelang/tree/main/tilelang/jit/adapter ?

@sgjzfzzf
Copy link
Contributor Author

Thanks, would it be possible for us to move cache into adapter class itself in https://github.com/tile-ai/tilelang/tree/main/tilelang/jit/adapter ?

For sure I could do that.

@sgjzfzzf
Copy link
Contributor Author

Thanks, would it be possible for us to move cache into adapter class itself in https://github.com/tile-ai/tilelang/tree/main/tilelang/jit/adapter ?

@LeiWang1999 Hi Lei! I tried to do that, but a horrible point is that the Adapter construction depends on the KernelCache. It'll lead to a cycle reference if we put KernelCache inside Adapter. Do you want me to refactor the whole cached part on the kernel construction, or just move KernelCache codes into the Adapter directory?

@silentCoder-dev
Copy link
Collaborator

Thanks, would it be possible for us to move cache into adapter class itself in https://github.com/tile-ai/tilelang/tree/main/tilelang/jit/adapter ?

@LeiWang1999 Hi Lei! I tried to do that, but a horrible point is that the Adapter construction depends on the KernelCache. It'll lead to a cycle reference if we put KernelCache inside Adapter. Do you want me to refactor the whole cached part on the kernel construction, or just move KernelCache codes into the Adapter directory?

It seems sufficient to move this into adapter to better support maintaining multiple backends.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tilelang/jit/adapter/nvrtc/kernel_cache.py (1)

14-30: Code duplication between methods.

Lines 15-20 and 24-29 contain identical logic for saving the Python wrapper file. This was previously flagged.

tilelang/jit/adapter/kernel_cache.py (1)

19-24: Fix logging to display file path instead of object.

Line 23 logs the executable object instead of kernel_lib_path. The debug message should show where the file is being saved (the path), not the object being saved.

🔎 Proposed fix
     def _save_so_cubin_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False):
         kernel_lib_path = os.path.join(cache_path, self.kernel_lib_path)
         executable = kernel.adapter.executable
         if verbose:
-            self.logger.debug(f"Saving kernel executable to file: {executable}")
+            self.logger.debug(f"Saving kernel executable to file: {kernel_lib_path}")
         KernelCache._safe_write_executable(executable, kernel_lib_path)
🧹 Nitpick comments (3)
tilelang/jit/adapter/cutedsl/kernel_cache.py (2)

41-43: Consider iterable unpacking for clarity.

Per static analysis (RUF005), prefer unpacking over concatenation for combining lists.

🔎 Suggested refactor
     @override
     def _get_required_files(self, cache_path: str) -> list[str]:
-        return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)]
+        return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]

45-47: Returning empty strings works but is subtly different from None.

The base class's _load_kernel_from_disk condition if ((host_kernel_source and device_kernel_source) or self.execution_backend == "cutedsl") treats empty strings as falsy, so this works due to the cutedsl backend check. However, returning (None, None) would be more explicit and consistent with the base class's error-handling pattern that returns None on failure.

🔎 Suggested alternative
     @override
     def _load_kernel_source(self, device_kernel_path: str, host_kernel_path: str, verbose: bool = False) -> tuple[str | None, str | None]:
-        return "", ""
+        return None, None
tilelang/jit/adapter/kernel_cache.py (1)

10-11: Consider whether new override is necessary.

The __new__ method simply delegates to the superclass without additional logic. While this pattern is consistent across all backend implementations in the codebase (CTypes, Cython, Torch, NVRTC), it appears redundant if no customization is needed.

If this pattern serves a specific purpose (e.g., enforcing singleton behavior or future extensibility), consider adding a comment explaining the rationale. Otherwise, the method could be removed to reduce boilerplate.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e28dba1 and 81dd6eb.

📒 Files selected for processing (7)
  • tilelang/cache/__init__.py (2 hunks)
  • tilelang/jit/adapter/ctypes/kernel_cache.py (1 hunks)
  • tilelang/jit/adapter/cutedsl/kernel_cache.py (1 hunks)
  • tilelang/jit/adapter/cython/kernel_cache.py (1 hunks)
  • tilelang/jit/adapter/kernel_cache.py (1 hunks)
  • tilelang/jit/adapter/nvrtc/kernel_cache.py (1 hunks)
  • tilelang/jit/adapter/torch/kernel_cache.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
tilelang/jit/adapter/cython/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/jit/adapter/cutedsl/kernel_cache.py (3)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/jit/adapter/nvrtc/kernel_cache.py (2)
  • _save_kernel_source_code_to_disk (14-21)
  • _save_so_cubin_to_disk (23-30)
tilelang/jit/adapter/kernel_cache.py (1)
  • _save_so_cubin_to_disk (19-24)
tilelang/jit/adapter/torch/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/cache/__init__.py (5)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
  • CTypesKernelCache (4-6)
tilelang/jit/adapter/cython/kernel_cache.py (1)
  • CythonKernelCache (4-6)
tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
  • NVRTCKernelCache (7-30)
tilelang/jit/adapter/torch/kernel_cache.py (1)
  • TorchKernelCache (4-6)
tilelang/jit/adapter/kernel_cache.py (1)
  • TVMFFIKernelCache (7-24)
🪛 Ruff (0.14.8)
tilelang/jit/adapter/cutedsl/kernel_cache.py

43-43: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

tilelang/cache/__init__.py

61-61: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (13)
tilelang/jit/adapter/cython/kernel_cache.py (1)

4-6: Minimal adapter follows established pattern.

The subclass correctly delegates to the base KernelCache. Note that the singleton implementation in the base class stores the instance on KernelCache._instance directly, meaning all backend-specific subclasses share the same singleton—this appears intentional for a unified cache but is worth being aware of.

tilelang/jit/adapter/torch/kernel_cache.py (1)

4-6: LGTM!

The TorchKernelCache subclass correctly delegates to the base class and follows the consistent adapter pattern established across backends.

tilelang/jit/adapter/nvrtc/kernel_cache.py (1)

7-12: Class attributes and new are correctly defined.

The kernel_lib_path = "kernel.cubin" appropriately overrides the base class default for the NVRTC backend.

tilelang/jit/adapter/cutedsl/kernel_cache.py (3)

10-17: Class attributes and structure are well-defined.

The CuTeDSL-specific paths (launcher_lib.so, launcher.cpp) are appropriately declared as class attributes.


19-21: No-op override is intentional for CuTeDSL.

The CuTeDSL backend doesn't persist kernel source in the traditional manner, relying on launcher artifacts instead.


23-39: Defensive attribute access is appropriate.

Good use of getattr and hasattr to safely check for optional lib_generator and launcher_cpp_code attributes before attempting to save.

tilelang/jit/adapter/ctypes/kernel_cache.py (1)

4-6: LGTM!

The CTypesKernelCache subclass follows the consistent minimal adapter pattern.

tilelang/cache/__init__.py (4)

5-18: LGTM! Clean TYPE_CHECKING usage prevents circular imports.

The TYPE_CHECKING conditional import pattern properly isolates the KernelCache type reference for type checking while avoiding runtime circular dependencies. All backend implementations are imported and ready for the dispatch pool.


20-28: Well-structured dispatch pool with singleton instances.

The dispatch pool architecture cleanly maps backend names to their singleton KernelCache implementations. This eliminates the conditional branching that was present in the previous design.


45-59: Dispatch logic correctly routes to backend implementations.

The normalization of "auto"/None to "tvm_ffi" provides a sensible default, and all parameters are properly forwarded to the selected backend's cached() method.


60-61: Exception handling addresses previous feedback.

The ValueError for unsupported backends implements the behavior requested in past review comments. The inline error message is clear and actionable.

Note: Ruff suggests moving the message to an exception subclass (TRY003), but this inline approach is acceptable for a simple validation error.

The static analysis hint is optional; if you prefer to follow strict style guidelines, consider defining a custom exception class, but the current implementation is functionally correct.

tilelang/jit/adapter/kernel_cache.py (2)

7-8: LGTM! Backend-specific library path is properly defined.

The TVMFFIKernelCache subclass appropriately sets kernel_lib_path = "executable.so" for the TVM FFI backend, following the pattern used by other backend implementations (e.g., NVRTC uses "kernel.cubin").


13-17: LGTM! Wrapper kernel saving logic is correct.

The method properly constructs the host kernel path, logs the correct path variable when verbose, and safely writes the host source using the parent class helper method.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tilelang/cache/__init__.py (1)

37-37: Missing "cutedsl" in execution_backend type hint.

The _dispatch_pool includes "cutedsl" (line 26), but the type hint for execution_backend on line 37 omits it. This creates a type-checking inconsistency where the backend is available at runtime but not reflected in the type signature.

-    execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch"] | None = None,
+    execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] | None = None,

Note: This issue was already flagged in a previous review comment.

🧹 Nitpick comments (1)
tilelang/cache/__init__.py (1)

58-58: Optional: Consider Ruff hint about exception message format.

Ruff suggests avoiding long messages directly in exception raises (TRY003). For this simple case, the current format is clear and maintainable, so this is purely a linting preference.

If you want to strictly follow the linter suggestion:

+_UNSUPPORTED_BACKEND_MSG = 'Cannot find support for execution backend "{}"'
+
 def cached(
     ...
     else:
-        raise ValueError(f'Cannot find support for execution backend "{execution_backend}"')
+        raise ValueError(_UNSUPPORTED_BACKEND_MSG.format(execution_backend))
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81dd6eb and d256f9b.

📒 Files selected for processing (1)
  • tilelang/cache/__init__.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
tilelang/cache/__init__.py

58-58: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
tilelang/cache/__init__.py (2)

20-28: Good refactoring: dispatch pool centralizes backend selection.

The dispatch pool cleanly maps backend keys to singleton instances, eliminating the if-else branching mentioned in the PR objectives. This design makes adding new backends straightforward.


10-15: All imported KernelCache implementations verified and exist. The refactored dispatch system has all six required backend-specific classes properly defined in their respective modules. No issues found.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tilelang/cache/__init__.py (1)

37-37: Critical: Default execution_backend=None will always fail.

The parameter defaults to None, but None is not a key in _dispatch_pool (lines 21-28). Every call to cached() without an explicit execution_backend will raise ValueError at line 58, breaking the public API.

Based on past review comments, the old implementation supported "auto" mode for backend detection. This refactor removed that logic but didn't update the default value.

🔎 Recommended fixes

Option 1: Restore auto-detection for None (preserves backward compatibility)

+    # Auto-detect backend if not specified
+    if execution_backend is None:
+        # Determine the best available backend based on target
+        if isinstance(target, str) and target == "auto":
+            execution_backend = "tvm_ffi"  # or implement detection logic
+        else:
+            # Parse target to determine appropriate backend
+            execution_backend = "tvm_ffi"  # placeholder
+    
     if execution_backend in _dispatch_pool:

Option 2: Use a valid default backend

-    execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] | None = None,
+    execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] = "tvm_ffi",

Option 3: Make execution_backend required

-    execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] | None = None,
+    execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"],
🧹 Nitpick comments (2)
tilelang/cache/__init__.py (2)

20-28: Consider lazy initialization of backend instances.

All backend instances are created at module import time, even if only one backend will be used. For better startup performance, consider lazy initialization:

🔎 Optional refactor for lazy initialization
-_dispatch_pool: dict[str, KernelCache] = {
-    "tvm_ffi": TVMFFIKernelCache(),
-    "ctypes": CTypesKernelCache(),
-    "cython": CythonKernelCache(),
-    "nvrtc": NVRTCKernelCache(),
-    "cutedsl": CuTeDSLKernelCache(),
-    "torch": TorchKernelCache(),
-}
+_dispatch_pool: dict[str, KernelCache] = {}
+
+def _get_backend_cache(backend: str) -> KernelCache:
+    """Get or create a backend cache instance."""
+    if backend not in _dispatch_pool:
+        backend_classes = {
+            "tvm_ffi": TVMFFIKernelCache,
+            "ctypes": CTypesKernelCache,
+            "cython": CythonKernelCache,
+            "nvrtc": NVRTCKernelCache,
+            "cutedsl": CuTeDSLKernelCache,
+            "torch": TorchKernelCache,
+        }
+        if backend in backend_classes:
+            _dispatch_pool[backend] = backend_classes[backend]()
+    return _dispatch_pool.get(backend)

Then update line 46 to use _get_backend_cache(execution_backend).


57-58: Consider extracting error message to improve maintainability.

Static analysis flags this as TRY003 (long message outside exception class). While the current message is clear and acceptable, extracting it can improve consistency if you add more validation errors later.

🔎 Optional refactor
+_UNSUPPORTED_BACKEND_ERROR = (
+    'Cannot find support for execution backend "{}". '
+    f'Supported backends: {", ".join(_dispatch_pool.keys())}'
+)
+
 def cached(
     ...
 ) -> JITKernel:
     ...
     if execution_backend in _dispatch_pool:
         ...
     else:
-        raise ValueError(f'Cannot find support for execution backend "{execution_backend}"')
+        raise ValueError(_UNSUPPORTED_BACKEND_ERROR.format(execution_backend))
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d256f9b and 2e3be15.

📒 Files selected for processing (2)
  • tilelang/cache/__init__.py
  • tilelang/jit/adapter/nvrtc/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tilelang/jit/adapter/nvrtc/kernel_cache.py
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/cache/__init__.py (1)
tilelang/cache/kernel_cache.py (2)
  • KernelCache (25-449)
  • cached (113-227)
🪛 Ruff (0.14.8)
tilelang/cache/__init__.py

58-58: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
🔇 Additional comments (1)
tilelang/cache/__init__.py (1)

5-18: LGTM! Clean import structure with proper TYPE_CHECKING usage.

The TYPE_CHECKING pattern correctly avoids circular import issues while maintaining type safety. The backend-specific imports are well-organized.

@SiriusNEO SiriusNEO self-requested a review December 22, 2025 04:47
Copy link
Collaborator

@SiriusNEO SiriusNEO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just some comments to go

@SiriusNEO SiriusNEO changed the title [Enhancement] refactor kernel_cache [Enhancement] Refactor KernelCache to use inheritance-based design Dec 22, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)

1-4: Consider adding a docstring to document the placeholder implementation.

The empty class body is valid but would benefit from a docstring explaining that CTypesKernelCache currently uses all default KernelCache behavior and may be extended in the future with ctypes-specific logic.

🔎 Suggested documentation
+"""CTypes backend kernel cache implementation."""
 from tilelang.cache.kernel_cache import KernelCache


-class CTypesKernelCache(KernelCache): ...
+class CTypesKernelCache(KernelCache):
+    """
+    Kernel cache for the ctypes execution backend.
+    
+    Currently uses default KernelCache behavior. Backend-specific
+    customizations can be added here as needed.
+    """
+    ...
tilelang/jit/adapter/cython/kernel_cache.py (1)

1-4: Consider adding a docstring to document the placeholder implementation.

Similar to CTypesKernelCache, this empty class would benefit from documentation explaining it's a placeholder for Cython-specific cache behavior.

🔎 Suggested documentation
+"""Cython backend kernel cache implementation."""
 from tilelang.cache.kernel_cache import KernelCache


-class CythonKernelCache(KernelCache): ...
+class CythonKernelCache(KernelCache):
+    """
+    Kernel cache for the Cython execution backend.
+    
+    Currently uses default KernelCache behavior. Backend-specific
+    customizations can be added here as needed.
+    """
+    ...
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)

41-43: Consider iterable unpacking for concatenation.

The list concatenation works correctly but could use iterable unpacking for slightly better performance.

🔎 Optional optimization
     def _get_required_files(self, cache_path: str) -> list[str]:
-        return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)]
+        return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e3be15 and 1c0c714.

📒 Files selected for processing (8)
  • tilelang/cache/__init__.py
  • tilelang/cache/kernel_cache.py
  • tilelang/jit/adapter/ctypes/kernel_cache.py
  • tilelang/jit/adapter/cutedsl/kernel_cache.py
  • tilelang/jit/adapter/cython/kernel_cache.py
  • tilelang/jit/adapter/kernel_cache.py
  • tilelang/jit/adapter/nvrtc/kernel_cache.py
  • tilelang/jit/adapter/torch/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tilelang/jit/adapter/torch/kernel_cache.py
  • tilelang/jit/adapter/kernel_cache.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

  • tilelang/cache/__init__.py
🧬 Code graph analysis (4)
tilelang/jit/adapter/cython/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-475)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-475)
tilelang/jit/adapter/nvrtc/kernel_cache.py (2)
tilelang/cache/kernel_cache.py (4)
  • KernelCache (25-475)
  • _save_so_cubin_to_disk (412-417)
  • _safe_write_file (253-260)
  • _load_binary (247-250)
tilelang/jit/adapter/kernel_cache.py (1)
  • _save_so_cubin_to_disk (16-21)
tilelang/cache/kernel_cache.py (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (6)
  • _set_adapter_cache_path (50-52)
  • _save_kernel_source_code_to_disk (20-21)
  • _save_so_cubin_to_disk (24-39)
  • _get_required_files (42-43)
  • _load_kernel_source (46-47)
  • _build_kernel (55-83)
🪛 Ruff (0.14.10)
tilelang/cache/__init__.py

58-58: Avoid specifying long messages outside the exception class

(TRY003)

tilelang/jit/adapter/cutedsl/kernel_cache.py

43-43: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

tilelang/cache/kernel_cache.py

443-443: Unused method argument: kernel

(ARG002)


443-443: Unused method argument: cache_path

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (14)
tilelang/jit/adapter/nvrtc/kernel_cache.py (1)

7-18: LGTM! Clean NVRTC-specific implementation.

The class correctly customizes paths for NVRTC's cubin output and implements the necessary logic to persist the Python wrapper alongside the binary. The previous duplicate logic concern has been addressed.

tilelang/cache/__init__.py (3)

21-28: Clean dispatch mechanism successfully replaces singleton pattern.

The backend dispatch map provides a clear, extensible way to route kernel cache operations to backend-specific implementations. This aligns with the PR objective to eliminate if-else branching.


37-37: Good choice of default execution backend.

Setting the default to "tvm_ffi" addresses previous review concerns about None causing errors. This provides a sensible fallback while still allowing explicit backend selection.

Based on past review comments, this follows the suggestion to use "tvm_ffi" as the default for internal APIs.


45-58: Dispatch logic is straightforward and correct.

The if-else routing to backend-specific cache instances is clear. The ValueError provides a helpful message for unsupported backends.

Note: The static analysis hint (TRY003) about the long error message is a stylistic suggestion. The current inline message is acceptable and clear for this use case.

tilelang/jit/adapter/cutedsl/kernel_cache.py (4)

13-21: CuTeDSL-specific paths and no-op kernel source save are appropriate.

The custom paths for launcher artifacts align with CuTeDSL's C++ launcher design. The early return in _save_kernel_source_code_to_disk indicates CuTeDSL handles kernel sources differently than other backends.


23-39: Defensive attribute checking for launcher artifacts is good practice.

The use of getattr() and hasattr() ensures the code handles cases where the adapter or lib_generator may not have the expected launcher attributes. The verbose logging aids debugging.


45-52: Appropriate overrides for CuTeDSL's kernel loading behavior.

Returning empty strings from _load_kernel_source and propagating the cache path to the adapter reflect CuTeDSL's different approach to managing kernel sources and artifacts.


54-83: Correct kernel reconstruction from cached parameters.

The _build_kernel implementation properly delegates to JITKernel.from_database when parameters are available, which is the expected behavior for loading cached kernels.

tilelang/cache/kernel_cache.py (6)

39-43: Excellent use of class attributes for path customization.

Moving path constants from module-level to class attributes allows backend-specific subclasses to override paths as needed (e.g., NVRTC using "kernel.cubin" instead of "kernel_lib.so"). This is a key enabler of the inheritance-based design.


290-303: Well-structured delegation to specialized save methods.

Breaking _save_kernel_to_disk into separate calls to _save_kernel_source_code_to_disk, _save_wrapper_kernel_code_to_disk, and _save_so_cubin_to_disk provides clear override points for backend-specific behavior while maintaining error handling at the orchestration level.


351-380: Clean refactor of disk loading logic with extensible helper methods.

The extraction of _get_required_files, _load_kernel_source, and _build_kernel successfully eliminates conditional logic while enabling backends like CuTeDSL to customize file requirements and loading behavior.


399-441: New helper methods provide excellent override points for subclasses.

These methods establish the template method pattern, allowing each backend to customize specific aspects:

  • _save_kernel_source_code_to_disk: Device kernel persistence
  • _save_wrapper_kernel_code_to_disk: Host wrapper persistence
  • _save_so_cubin_to_disk: Library/binary persistence
  • _get_required_files: File existence checks
  • _load_kernel_source: Source code loading

This design successfully achieves the PR objective of eliminating if-else branching in favor of polymorphism.


443-444: Hook method with unused parameters is intentional design.

The _set_adapter_cache_path method serves as an extension point for subclasses (e.g., CuTeDSLKernelCache overrides this at line 50-52 of tilelang/jit/adapter/cutedsl/kernel_cache.py to propagate the cache path). The unused parameters in the base implementation are expected for hook methods.

The static analysis warnings for unused arguments (ARG002) are false positives in this context.


446-475: Flexible kernel reconstruction logic supports diverse backends.

The _build_kernel method provides a sensible default that works for most backends while allowing specialized subclasses (like CuTeDSL) to implement custom reconstruction logic based on available artifacts.

@sgjzfzzf sgjzfzzf requested a review from lucifer1004 December 25, 2025 01:12
@lucifer1004
Copy link
Collaborator

I have found a perfect solution for testing cache utilities. Wait for my push.

@lucifer1004
Copy link
Collaborator

My new test successfully detected the failure of cutedsl disk cache. I will push the fix together with the test.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
maint/scripts/run_local_ci_test.sh (1)

18-22: Critical inconsistency: Comment requires separate CuTeDSL invocation, but code runs unified test.

The comment explicitly states "Run the CuTeDSL JIT tests in a separate pytest invocation" with TILELANG_USE_GEMM_V1=1, yet Line 22 runs all tests in a single unified command without isolating CuTeDSL. Per the PR objectives, reviewer lucifer1004 specifically requested running cutedsl separately with this environment variable set.

Either:

  1. Add the separate CuTeDSL test invocation as documented, or
  2. Remove/update the comment if the kernel cache refactor eliminated this requirement.
🔎 Proposed fix: Add separate CuTeDSL test invocation

If CuTeDSL still requires TILELANG_USE_GEMM_V1=1, implement the separate invocation as indicated:

 # Run pytest in parallel (4 workers) for all tests in the testing/python directory.
-# IMPORTANT: CuTeDSL backend currently requires GEMM v1 (TILELANG_USE_GEMM_V1=1).
-# Do NOT export it globally here, or you'll silently change the default GEMM selection
-# for unrelated tests. Run the CuTeDSL JIT tests in a separate pytest invocation.
 cd testing/python
-python -m pytest -n 4 . --verbose --color=yes --durations=0 --showlocals --cache-clear
+python -m pytest -n 4 . --verbose --color=yes --durations=0 --showlocals --cache-clear --ignore=jit/test_tilelang_jit_gemm_cutedsl.py
+
+# Run CuTeDSL JIT tests separately with GEMM v1 enabled.
+# IMPORTANT: Do NOT export TILELANG_USE_GEMM_V1 globally, or you'll silently change
+# the default GEMM selection for unrelated tests.
+TILELANG_USE_GEMM_V1=1 python -m pytest -n 4 jit/test_tilelang_jit_gemm_cutedsl.py --verbose --color=yes --durations=0 --showlocals --cache-clear

Based on PR objectives, verify with lucifer1004 whether the separate invocation is still required after this refactor.

🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)

42-44: LGTM - Extends required files for CuTeDSL.

Adding the launcher library to required files ensures cache validation checks for all necessary artifacts.

The static analysis suggests using iterable unpacking ([*super()._get_required_files(cache_path), ...]), but the current concatenation is equally clear and readable for this use case.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 631861c and cd9572c.

📒 Files selected for processing (8)
  • .github/workflows/ci.yml
  • maint/scripts/run_local_ci_test.sh
  • testing/python/cache/test_tilelang_kernel_cache.py
  • testing/python/jit/test_tilelang_jit_cutedsl.py
  • testing/python/jit/test_tilelang_jit_nvrtc.py
  • tilelang/jit/adapter/cutedsl/adapter.py
  • tilelang/jit/adapter/cutedsl/kernel_cache.py
  • tilelang/jit/adapter/cutedsl/libgen.py
💤 Files with no reviewable changes (3)
  • testing/python/jit/test_tilelang_jit_nvrtc.py
  • .github/workflows/ci.yml
  • tilelang/jit/adapter/cutedsl/libgen.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

  • maint/scripts/run_local_ci_test.sh
  • testing/python/cache/test_tilelang_kernel_cache.py
  • testing/python/jit/test_tilelang_jit_cutedsl.py
🧬 Code graph analysis (2)
tilelang/jit/adapter/cutedsl/adapter.py (1)
tilelang/jit/adapter/cython/adapter.py (1)
  • libpath (366-368)
tilelang/jit/adapter/cutedsl/kernel_cache.py (2)
tilelang/cache/kernel_cache.py (8)
  • KernelCache (25-472)
  • _save_kernel_source_code_to_disk (395-400)
  • _save_so_cubin_to_disk (408-413)
  • _safe_write_file (248-255)
  • _get_required_files (415-418)
  • _load_kernel_source (420-437)
  • _set_adapter_cache_path (439-440)
  • _build_kernel (442-472)
tilelang/jit/adapter/kernel_cache.py (1)
  • _save_so_cubin_to_disk (16-21)
🪛 Ruff (0.14.10)
testing/python/cache/test_tilelang_kernel_cache.py

189-189: Unused function argument: clean_cache_env

(ARG001)

tilelang/jit/adapter/cutedsl/kernel_cache.py

44-44: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (15)
testing/python/jit/test_tilelang_jit_cutedsl.py (1)

7-9: LGTM - Module-level environment configuration for CuTeDSL tests.

Setting TILELANG_USE_GEMM_V1 = "1" at module level ensures the GEMM V1 pathway is enabled for all CuTeDSL JIT kernel tests in this file. This aligns with the PR's goal of testing the refactored cache behavior.

tilelang/jit/adapter/cutedsl/adapter.py (2)

93-94: Good improvement: Using context manager for file I/O.

Using with open() ensures proper resource cleanup. This prevents potential file handle leaks that could occur with the previous direct open().read() pattern.


150-151: LGTM - Consistent context manager usage in from_database.

Matches the pattern used in __init__ for proper file handle management.

testing/python/cache/test_tilelang_kernel_cache.py (7)

1-21: Excellent test documentation.

The test plan clearly explains the strategy, CI safety considerations, and technical details. This makes the test suite maintainable and helps future contributors understand the purpose and constraints.


45-63: LGTM - Clean callback tracking implementation.

The PostProcCounter class effectively tracks postproc invocations and stores markers for verification. The closure-based callback registration is appropriate for this use case.


82-82: Consider handling missing backend parameter gracefully.

If a test using this fixture doesn't have a backend parameter in callspec.params, this will raise a KeyError. While all current tests are parametrized with backend, using .get() with a default makes this more robust:

backend = request.node.callspec.params.get("backend", "nvrtc")

Wait, I see this is already using .get() with a default. The code is correct.


108-185: Well-structured cache verification test.

The test effectively validates:

  1. Cache miss triggers postproc (counter increments)
  2. Cache hit skips postproc (counter unchanged)
  3. Marker preservation across cache load
  4. Functional correctness of cached kernels

The UUID-based cache key isolation ensures test independence across runs.


188-235: LGTM - Validates cache key differentiation.

This test confirms that different kernel implementations produce different cache keys, ensuring the test methodology is sound.

The static analysis hint about clean_cache_env being unused is a false positive—the fixture is required for its side effects (setting up isolated cache directories and environment variables), which is a standard pytest pattern.


238-273: LGTM - Meta-test for fixture isolation.

This test validates that the clean_cache_env fixture provides proper test isolation, which is essential for reliable CI runs.


276-282: Good test cleanup practice.

The autouse=True fixture ensures postproc callbacks are restored to identity functions after each test, preventing test pollution across the suite.

tilelang/jit/adapter/cutedsl/kernel_cache.py (5)

13-18: LGTM - Clear path constant definitions.

The class properly defines CuTeDSL-specific artifact paths. Using class-level constants makes the file structure explicit and maintainable.


20-22: Intentional no-op for CuTeDSL source saving.

CuTeDSL doesn't separately save kernel source code since the source is embedded in the generated Python library (kernel.py). This is consistent with the adapter's behavior in adapter.py where device_kernel_source is read from libpath.


24-40: LGTM - Robust launcher artifact saving.

The implementation:

  • Uses defensive getattr/hasattr checks for optional attributes
  • Leverages _safe_write_file for atomic writes (important for concurrent access)
  • Optionally saves C++ source for debugging purposes

50-53: LGTM - Enables delayed cubin caching.

This propagates the cache path to the adapter, enabling _save_cubin_to_cache_if_needed (adapter.py line 230) to save the cubin after first execution. This is necessary because the cubin may not exist until the kernel is actually run.


55-85: LGTM - Proper kernel construction from cached data.

The implementation correctly:

  • Delegates to JITKernel.from_database with all necessary parameters
  • Returns None when kernel_params is unavailable, signaling a failed cache load
  • Passes through all configuration options (pass_configs, compile_flags, etc.)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)

59-88: Restore source validation to prevent invalid kernels from being constructed.

The base class _build_kernel validates that both host_kernel_source and device_kernel_source are non-empty before calling JITKernel.from_database (see tilelang/jit/adapter/kernel_cache.py line 454: if host_kernel_source and device_kernel_source and kernel_params:). This override only checks if kernel_params: (line 73), which allows empty or None sources to reach JITKernel.from_database without validation. This issue is directly related to the existing review comment about _load_kernel_source return values.

If _load_kernel_source is updated to return (None, None) on errors (as it should per the previous comment), this validation gap becomes critical.

🔎 Proposed fix
     @override
     def _build_kernel(
         self,
         func: Callable | None,
         host_kernel_source: str,
         device_kernel_source: str,
         kernel_lib_path: str,
         kernel_params: list[KernelParam] | None,
         target: str | Target,
         target_host: str | Target | None,
         out_idx: list[int] | None,
         execution_backend: Literal["tvm_ffi", "cython", "nvrtc", "torch", "cutedsl"],
         pass_configs: dict | None,
         compile_flags: list[str] | str | None,
     ) -> JITKernel | None:
-        if kernel_params:
+        if host_kernel_source and device_kernel_source and kernel_params:
             return JITKernel.from_database(
                 func=func,
                 host_kernel_source=host_kernel_source,
                 device_kernel_source=device_kernel_source,
                 kernel_lib_path=kernel_lib_path,
                 params=kernel_params,
                 target=target,
                 target_host=target_host,
                 out_idx=out_idx,
                 execution_backend=execution_backend,
                 pass_configs=pass_configs,
                 compile_flags=compile_flags,
             )
         else:
             return None
🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)

44-45: Consider using iterable unpacking for better performance.

The list concatenation can be replaced with iterable unpacking for a minor performance improvement.

🔎 Proposed refactor
     def _get_required_files(self, cache_path: str) -> list[str]:
-        return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)]
+        return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd9572c and 772ea24.

📒 Files selected for processing (2)
  • tilelang/jit/adapter/cutedsl/adapter.py
  • tilelang/jit/adapter/cutedsl/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tilelang/jit/adapter/cutedsl/adapter.py
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
tilelang/jit/adapter/kernel_cache.py (1)
  • _save_so_cubin_to_disk (16-21)
🪛 Ruff (0.14.10)
tilelang/jit/adapter/cutedsl/kernel_cache.py

45-45: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)

43-45: Consider iterable unpacking per static analysis hint.

The current concatenation works correctly. If you prefer the more modern style, you could use:

return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]

However, the current form is equally clear and readable.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 772ea24 and 3913014.

📒 Files selected for processing (2)
  • src/tl_templates/cuda/gemm_mma.h
  • tilelang/jit/adapter/cutedsl/kernel_cache.py
🧰 Additional context used
🪛 Ruff (0.14.10)
tilelang/jit/adapter/cutedsl/kernel_cache.py

45-45: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

🔇 Additional comments (5)
src/tl_templates/cuda/gemm_mma.h (1)

53-53: LGTM! Float dispatch added consistently across modern architectures.

The new TL_DISPATCH_MMA(float, float, float, SM80_16x8x8_F32TF32TF32F32_TN) specializations enable FP32 input/output GEMM operations across SM80+ architectures. The implementation correctly leverages tensor core TF32 instructions (note lines 273-277 internally convert float to tfloat32_t for computation). The omission from the SM75 branch (line 106) is appropriate since SM75 lacks TF32 support.

Also applies to: 67-67, 80-80, 93-93, 102-102

tilelang/jit/adapter/cutedsl/kernel_cache.py (4)

15-19: LGTM! Path attributes are clearly defined for CuTeDSL backend.

The unified "kernel.py" path for all kernel sources aligns with CuTeDSL's single-file architecture, and the launcher-specific paths (launcher_lib.so, launcher.cpp) properly encapsulate CuTeDSL's C++ launcher requirements.


25-41: LGTM! Launcher artifact persistence is correctly implemented.

The method properly guards attribute access, uses the base class's safe I/O helpers (_safe_write_file, _load_binary), and includes appropriate verbose logging. The separation of required artifacts (launcher library) from optional debugging artifacts (launcher C++ source) is well-designed.


47-50: LGTM! Cache path propagation is correctly implemented.

The hasattr check provides appropriate defensive programming, and setting _cache_path on the adapter enables it to locate cached launcher artifacts during kernel loading.


21-23: Remove verification suggestion—the no-op is intentional design.

CuTeDSL kernels do not require separate source code caching. The kernel source is embedded in the compiled library and reconstructed as needed (see adapter.py line 94). The adapter saves compiled artifacts (launcher library) via _save_so_cubin_to_disk, not source code files.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tilelang/cache/kernel_cache.py (2)

402-406: Add null check for get_kernel_source() return value before writing.

This issue was previously flagged: some adapters (CuTeDSLKernelAdapter, NVRTCKernelAdapter) can return None from get_kernel_source(), which would cause a TypeError when passed to file.write(). Add an explicit check:

source = kernel.adapter.get_kernel_source()
if source is None:
    if verbose:
        self.logger.debug("Skipping host kernel source save: adapter returned None")
    return
KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(source))
🔎 Proposed fix
 def _save_wrapper_kernel_code_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False):
     host_kernel_path = os.path.join(cache_path, self.host_kernel_path)
+    source = kernel.adapter.get_kernel_source()
+    if source is None:
+        if verbose:
+            self.logger.debug("Skipping host kernel source save: adapter returned None")
+        return
     if verbose:
         self.logger.debug(f"Saving wrapped kernel source code to file: {host_kernel_path}")
-    KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(kernel.adapter.get_kernel_source()))
+    KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(source))

456-467: Use explicit is None checks instead of truthiness to distinguish missing from empty values.

This issue was previously flagged: the validation uses truthiness checks (if not host_kernel_source) that incorrectly treat empty strings as missing. Empty strings are valid (indicating an empty but present file), while None indicates a load failure.

The current logic would incorrectly reject backends that legitimately return empty source strings. Use explicit None checks:

missing_components = []
if host_kernel_source is None:
    missing_components.append("host_kernel_source")
if device_kernel_source is None:
    missing_components.append("device_kernel_source")
if kernel_params is None:
    missing_components.append("kernel_params")
🔎 Proposed fix
     # Check all required components and report specific failures
     missing_components = []
-    if not host_kernel_source:
+    if host_kernel_source is None:
         missing_components.append("host_kernel_source")
-    if not device_kernel_source:
+    if device_kernel_source is None:
         missing_components.append("device_kernel_source")
-    if not kernel_params:
+    if kernel_params is None:
         missing_components.append("kernel_params")
🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)

40-42: Consider iterable unpacking for cleaner list concatenation.

The current implementation concatenates lists using +, which works correctly. However, Python's iterable unpacking syntax is more idiomatic and slightly cleaner:

return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]

This is purely a style improvement with no functional difference.

Based on static analysis hints (Ruff RUF005).

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3913014 and 230b3c4.

📒 Files selected for processing (2)
  • tilelang/cache/kernel_cache.py
  • tilelang/jit/adapter/cutedsl/kernel_cache.py
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (2)
tilelang/cache/kernel_cache.py (7)
  • KernelCache (25-481)
  • _save_kernel_source_code_to_disk (395-400)
  • _save_so_cubin_to_disk (408-413)
  • _safe_write_file (248-255)
  • _load_binary (242-245)
  • _get_required_files (415-418)
  • _set_adapter_cache_path (439-440)
tilelang/jit/adapter/kernel_cache.py (1)
  • _save_so_cubin_to_disk (16-21)
🪛 Ruff (0.14.10)
tilelang/cache/kernel_cache.py

439-439: Unused method argument: kernel

(ARG002)


439-439: Unused method argument: cache_path

(ARG002)

tilelang/jit/adapter/cutedsl/kernel_cache.py

42-42: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

🔇 Additional comments (5)
tilelang/cache/kernel_cache.py (5)

39-42: Well-designed refactor supporting inheritance-based backend dispatch.

Converting path constants to instance attributes enables subclasses to override backend-specific paths (e.g., CuTeDSL using "kernel.py" instead of "device_kernel.cu"), which directly supports the PR objective of eliminating if-else backend branching.


215-215: Good use of polymorphism for backend-specific adapter cache path setup.

Replacing direct attribute manipulation with the _set_adapter_cache_path hook allows subclasses to control adapter state without base class assumptions about adapter internals.


284-301: Clean refactor delegating to overridable helpers.

Breaking disk saves into separate overridable methods (_save_kernel_source_code_to_disk, _save_wrapper_kernel_code_to_disk, _save_so_cubin_to_disk) with per-method error handling enables fine-grained backend customization while keeping error boundaries clear.


420-437: Robust error handling for optional kernel source files.

The method correctly returns (None, None) when source files are missing or unreadable, allowing callers to distinguish between "source not available" and other failure modes. This supports backends where source files are optional.


115-120: Update docstring to match the method signature: target and verbose no longer accept None values.

The docstring for the cached() method in KernelCache (lines 131-134) incorrectly claims that target and verbose accept None to trigger environment variable reads. However, the actual method signature makes these parameters required (target: str | Target and verbose: bool with no defaults). The environment variable handling has been moved to the public cached() wrapper in tilelang/cache/__init__.py, which accepts optional values and properly resolves them before passing explicit values to the internal method. Update the docstring to clarify that target and verbose are now required parameters, and remove references to None defaults and environment variable fallbacks.

LeiWang1999
LeiWang1999 previously approved these changes Dec 26, 2025
Copy link
Member

@LeiWang1999 LeiWang1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Awesome contributions!

@LeiWang1999
Copy link
Member

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 230b3c4 and d04446d.

📒 Files selected for processing (3)
  • testing/python/cache/test_tilelang_kernel_cache.py
  • testing/python/jit/test_tilelang_jit_cutedsl.py
  • tilelang/jit/adapter/cutedsl/adapter.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

  • testing/python/cache/test_tilelang_kernel_cache.py
  • testing/python/jit/test_tilelang_jit_cutedsl.py
🧬 Code graph analysis (2)
tilelang/jit/adapter/cutedsl/adapter.py (1)
tilelang/jit/adapter/cython/adapter.py (1)
  • libpath (366-368)
testing/python/cache/test_tilelang_kernel_cache.py (2)
tilelang/env.py (1)
  • enable_cache (275-276)
tilelang/jit/adapter/cutedsl/adapter.py (1)
  • get_kernel_source (205-213)
🪛 Ruff (0.14.10)
testing/python/cache/test_tilelang_kernel_cache.py

202-202: Unused function argument: clean_cache_env

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (3)
tilelang/jit/adapter/cutedsl/adapter.py (1)

93-95: > Likely an incorrect or invalid review comment.

testing/python/cache/test_tilelang_kernel_cache.py (2)

1-87: Excellent test infrastructure design.

The test plan documentation (lines 1-21) clearly explains the cache verification strategy, and the fixture design properly isolates tests while ensuring correct environment restoration:

  • setup_module_env correctly saves original values before any modifications
  • clean_cache_env provides isolated cache directories per test
  • Callback cleanup ensures no cross-test contamination

This is a solid foundation for comprehensive cache testing across multiple backends.


124-287: Comprehensive cache behavior validation.

The three test scenarios effectively verify:

  1. Cache miss/hit detection via postproc callback tracking
  2. Cache key differentiation for distinct kernel implementations
  3. Cache isolation between test runs

The use of UUID-based unique identifiers in global_symbol ensures reliable cache key generation without collisions.

Note: The static analysis hint claiming clean_cache_env is unused in test_cache_miss_detection is a false positive—the fixture is properly used for its side effects (cache isolation and env setup).

@LeiWang1999 LeiWang1999 merged commit 9ff7c52 into tile-ai:main Dec 26, 2025
6 checks passed
@LeiWang1999
Copy link
Member

Merged, thanks for your awesome contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Refactor Kernel Cache System for Multiple Backends

5 participants