-
Notifications
You must be signed in to change notification settings - Fork 359
[Enhancement] Refactor KernelCache to use inheritance-based design #1483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughRefactors kernel caching into a dispatch map of backend-specific KernelCache subclasses, centralizes disk I/O into KernelCache helper methods, adds per-backend cache adapters (tvm_ffi, nvrtc, cutedsl, cython, torch), updates cached(...) routing and signatures, and adapts tests and CI to the new cache flow. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as Caller
participant Dispatcher as CacheDispatcher
participant Backend as Backend KernelCache
participant Adapter as KernelAdapter
participant FS as Filesystem
Caller->>Dispatcher: cached(func, target, execution_backend, ...)
Dispatcher->>Dispatcher: normalize/resolve target & execution_backend
Dispatcher->>Backend: select backend-specific KernelCache
Backend->>Backend: _set_adapter_cache_path(kernel, cache_path)
Backend->>Backend: _get_required_files(cache_path)
Backend->>FS: check required files exist
alt required files missing
Backend->>Adapter: request sources, wrapper, executable, params
Adapter-->>Backend: provide artifacts
Backend->>FS: _save_kernel_source_code_to_disk(...)
Backend->>FS: _save_wrapper_kernel_code_to_disk(...)
Backend->>FS: _save_so_cubin_to_disk(...)
end
Backend->>Backend: _load_kernel_source(...)
Backend->>Backend: _build_kernel(...) -> JITKernel | None
Backend-->>Caller: return JITKernel or error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-12-26T06:45:47.669ZApplied to files:
🧬 Code graph analysis (1)tilelang/jit/adapter/cutedsl/adapter.py (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
I tested them on CUDA with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tilelang/cache/__init__.py (1)
39-39: Type hint is inconsistent with dispatch pool keys.The
Literaltype is missing"cutedsl"which exists in_dispatch_pool. Also,Noneis allowed by the type hint but would cause aKeyErrorat runtime since there's noNonekey in the pool.Suggested fix
- execution_backend: Literal["auto", "tvm_ffi", "ctypes", "cython", "nvrtc", "torch"] | None = "auto", + execution_backend: Literal["auto", "tvm_ffi", "ctypes", "cython", "nvrtc", "cutedsl", "torch"] = "auto",
🧹 Nitpick comments (6)
tilelang/cache/torch_kernel_cache.py (1)
4-6: Superfluous*args, **kwargsin__new__signature.The base class
KernelCache.__new__(cls)does not accept additional arguments. Since these cache classes are instantiated without arguments in the dispatch pool, the*args, **kwargsparameters are unnecessary and could be removed for clarity.This applies to all similar subclasses in this PR.
Suggested simplification
class TorchKernelCache(KernelCache): - def __new__(cls, *args, **kwargs): - return super().__new__(cls, *args, **kwargs) + def __new__(cls): + return super().__new__(cls)tilelang/cache/__init__.py (1)
47-57: Consider adding a guard for invalid backend keys.If an unsupported
execution_backendvalue is passed,_dispatch_pool[execution_backend]will raise a rawKeyError. A more descriptive error could improve debuggability.Optional: Add validation with clearer error
+ if execution_backend not in _dispatch_pool: + raise ValueError( + f"Unsupported execution_backend '{execution_backend}'. " + f"Allowed: {list(_dispatch_pool.keys())}" + ) return _dispatch_pool[execution_backend].cached(tilelang/cache/tvm_ffi_kernel_cache.py (1)
10-11: Remove redundant__new__method.The
__new__method simply delegates to the parent without any custom logic. Since the parentKernelCachealready implements singleton behavior, this override is unnecessary.🔎 Proposed fix
- def __new__(cls, *args, **kwargs): - return super().__new__(cls, *args, **kwargs) -tilelang/cache/nvrtc_kernel_cache.py (1)
11-12: Remove redundant__new__method.The
__new__method simply delegates to the parent without any custom logic, making it unnecessary.🔎 Proposed fix
- def __new__(cls, *args, **kwargs): - return super().__new__(cls, *args, **kwargs) -tilelang/cache/cutedsl_kernel_cache.py (2)
16-17: Remove redundant__new__method.The
__new__method simply delegates to the parent without any custom logic, making it unnecessary.🔎 Proposed fix
- def __new__(cls, *args, **kwargs): - return super().__new__(cls, *args, **kwargs) -
42-43: Consider using unpacking for list concatenation.As suggested by the static analysis tool, unpacking provides cleaner syntax for combining lists.
🔎 Proposed refactor using unpacking
def _get_required_files(self, cache_path: str) -> list[str]: - return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)] + return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
tilelang/cache/__init__.py(2 hunks)tilelang/cache/auto_kernel_cache.py(1 hunks)tilelang/cache/ctypes_kernel_cache.py(1 hunks)tilelang/cache/cutedsl_kernel_cache.py(1 hunks)tilelang/cache/cython_kernel_cache.py(1 hunks)tilelang/cache/kernel_cache.py(4 hunks)tilelang/cache/nvrtc_kernel_cache.py(1 hunks)tilelang/cache/torch_kernel_cache.py(1 hunks)tilelang/cache/tvm_ffi_kernel_cache.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
tilelang/cache/cython_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/cache/torch_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/cache/tvm_ffi_kernel_cache.py (3)
tilelang/cache/kernel_cache.py (4)
KernelCache(25-449)_save_wrapper_kernel_code_to_disk(414-418)_safe_write_file(256-263)_save_so_cubin_to_disk(420-425)tilelang/cache/cutedsl_kernel_cache.py (1)
_save_so_cubin_to_disk(24-39)tilelang/cache/nvrtc_kernel_cache.py (1)
_save_so_cubin_to_disk(23-30)
tilelang/cache/cutedsl_kernel_cache.py (3)
tilelang/cache/kernel_cache.py (6)
KernelCache(25-449)_save_kernel_source_code_to_disk(407-412)_save_so_cubin_to_disk(420-425)_safe_write_file(256-263)_load_binary(250-253)_get_required_files(427-430)tilelang/cache/nvrtc_kernel_cache.py (2)
_save_kernel_source_code_to_disk(14-21)_save_so_cubin_to_disk(23-30)tilelang/cache/tvm_ffi_kernel_cache.py (1)
_save_so_cubin_to_disk(19-24)
tilelang/cache/auto_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/cache/ctypes_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/cache/kernel_cache.py (6)
tilelang/cache/cutedsl_kernel_cache.py (3)
_save_kernel_source_code_to_disk(20-21)_save_so_cubin_to_disk(24-39)_get_required_files(42-43)tilelang/cache/tvm_ffi_kernel_cache.py (2)
_save_wrapper_kernel_code_to_disk(13-17)_save_so_cubin_to_disk(19-24)tilelang/jit/kernel.py (3)
JITKernel(39-794)kernel_source(644-645)get_kernel_source(465-476)tilelang/jit/adapter/cutedsl/adapter.py (1)
get_kernel_source(206-214)tilelang/jit/adapter/nvrtc/adapter.py (1)
get_kernel_source(179-190)tilelang/jit/adapter/ctypes/adapter.py (2)
get_kernel_source(296-302)libpath(282-284)
tilelang/cache/__init__.py (8)
tilelang/cache/auto_kernel_cache.py (1)
AutoKernelCache(4-6)tilelang/cache/ctypes_kernel_cache.py (1)
CTypesKernelCache(4-6)tilelang/cache/cutedsl_kernel_cache.py (1)
CuTeDSLKernelCache(10-49)tilelang/cache/cython_kernel_cache.py (1)
CythonKernelCache(4-6)tilelang/cache/nvrtc_kernel_cache.py (1)
NVRTCKernelCache(7-30)tilelang/cache/torch_kernel_cache.py (1)
TorchKernelCache(4-6)tilelang/cache/tvm_ffi_kernel_cache.py (1)
TVMFFIKernelCache(7-24)tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/cache/nvrtc_kernel_cache.py (2)
tilelang/cache/kernel_cache.py (4)
_save_kernel_source_code_to_disk(407-412)_safe_write_file(256-263)_load_binary(250-253)_save_so_cubin_to_disk(420-425)tilelang/cache/tvm_ffi_kernel_cache.py (1)
_save_so_cubin_to_disk(19-24)
🪛 Ruff (0.14.8)
tilelang/cache/cutedsl_kernel_cache.py
43-43: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
🔇 Additional comments (7)
tilelang/cache/auto_kernel_cache.py (1)
4-6: LGTM, same note as TorchKernelCache.The same optional simplification of removing
*args, **kwargsapplies here.tilelang/cache/ctypes_kernel_cache.py (1)
4-6: LGTM, same note as TorchKernelCache.The same optional simplification of removing
*args, **kwargsapplies here.tilelang/cache/cython_kernel_cache.py (1)
4-6: LGTM, same note as TorchKernelCache.The same optional simplification of removing
*args, **kwargsapplies here.tilelang/cache/__init__.py (2)
22-30: Dispatch pool design looks good for the multi-backend refactor.The dictionary-based dispatch cleanly replaces the if-else pattern as intended by issue #1469. Each backend gets its own singleton cache instance.
75-76: Verify this module-level behavior is intentional.If
TILELANG_CLEAR_CACHEenv var is set, this will raiseRuntimeErrorduring import, preventing any use of the module. This seems intentional as a safeguard, but confirm this is the desired UX.tilelang/cache/kernel_cache.py (2)
39-45: Good refactoring: centralized path configuration.Moving from module-level path constants to per-class attributes enables backend-specific path customization without modifying the base class. This aligns well with the PR's goal of eliminating conditional logic.
407-449: Well-structured I/O abstraction.The new helper methods (
_save_kernel_source_code_to_disk,_save_wrapper_kernel_code_to_disk,_save_so_cubin_to_disk,_get_required_files,_load_kernel_source) provide clean extension points for subclasses while centralizing common disk I/O patterns. This significantly improves maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tilelang/cache/__init__.py (1)
47-58: Consider defensive error handling for invalid backend keys.The direct dictionary lookup on line 48 will raise a
KeyErrorif an invalidbackend_keyis provided. While theLiteraltype hint provides compile-time protection, runtime validation would offer better error messages when type checking is bypassed or when backend names are dynamically constructed.🔎 Proposed enhancement
backend_key: str = execution_backend if execution_backend is not None else "auto" +if backend_key not in _dispatch_pool: + valid_backends = ", ".join(_dispatch_pool.keys()) + raise ValueError( + f"Invalid execution_backend '{backend_key}'. " + f"Valid options are: {valid_backends}" + ) return _dispatch_pool[backend_key].cached(
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/gemm/example_gemm.py(1 hunks)tilelang/cache/__init__.py(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
examples/gemm/example_gemm.py
📚 Learning: 2025-12-18T04:50:00.512Z
Learnt from: silentCoder-dev
Repo: tile-ai/tilelang PR: 1464
File: testing/python/language/test_tilelang_language_rand.py:14-14
Timestamp: 2025-12-18T04:50:00.512Z
Learning: In `testing/python/language/test_tilelang_language_rand.py`, the TileLang kernel uses `blk_M = M` (single block) and calls `rng_rand()` four times per element to align results with the Triton implementation, which uses `blk_M = 128` (multiple blocks) and calls the RNG once per element. These differences compensate for internal RNG behavior differences between TileLang and Triton.
Applied to files:
examples/gemm/example_gemm.py
🧬 Code graph analysis (1)
examples/gemm/example_gemm.py (1)
tilelang/jit/kernel.py (1)
out_idx(636-637)
🔇 Additional comments (2)
tilelang/cache/__init__.py (1)
5-20: Clean TYPE_CHECKING usage to avoid circular imports.The use of
TYPE_CHECKINGto importKernelCacheonly for type hints is a good pattern that avoids potential circular import issues at runtime while maintaining type safety.examples/gemm/example_gemm.py (1)
5-5: Good example demonstrating the new backend selection API.The addition of
execution_backend="ctypes"effectively demonstrates the new multi-backend dispatch mechanism. Based on the PR description, this has been tested successfully on CUDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tilelang/cache/cutedsl_kernel_cache.py (2)
16-17: Remove unnecessary__new__override.The
__new__method simply delegates to the parent without adding any custom logic. Python automatically uses the parent's__new__if not overridden, making this dead code.🔎 Proposed fix
- def __new__(cls, *args, **kwargs): - return super().__new__(cls, *args, **kwargs) -
41-43: Consider using iterable unpacking for list extension.The list concatenation works correctly, but iterable unpacking is more idiomatic and slightly more efficient.
🔎 Proposed fix
@override def _get_required_files(self, cache_path: str) -> list[str]: - return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)] + return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tilelang/cache/cutedsl_kernel_cache.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/cache/cutedsl_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (7)
KernelCache(25-449)_save_kernel_source_code_to_disk(407-412)_save_so_cubin_to_disk(420-425)_safe_write_file(256-263)_load_binary(250-253)_get_required_files(427-430)_load_kernel_source(432-449)
🪛 Ruff (0.14.8)
tilelang/cache/cutedsl_kernel_cache.py
43-43: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (3)
tilelang/cache/cutedsl_kernel_cache.py (3)
19-21: Verify no-op implementation is intentional.The method intentionally skips saving kernel source code, deviating from the base class behavior. Confirm this is the correct design for CuTeDSL—does the backend not generate saveable kernel source, or is source code unnecessary for cache restoration?
23-39: LGTM! Launcher artifact handling is well-implemented.The method correctly saves CuTeDSL-specific launcher artifacts (library and optional C++ source) using defensive attribute checks and the parent class's safe write utilities.
45-47: Verify empty string return values are correct.The method returns empty strings rather than
None. The base class can returnNoneto indicate loading failure. Since_save_kernel_source_code_to_diskis a no-op and no source is saved, confirm that returning empty strings (rather thanNone) is the expected behavior for downstream code.Note: The previous review issue regarding method naming (_load_kernel_from_cache vs _load_kernel_source) has been correctly resolved.
LeiWang1999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, would it be possible for us to move cache into adapter class itself in https://github.com/tile-ai/tilelang/tree/main/tilelang/jit/adapter ?
For sure I could do that. |
@LeiWang1999 Hi Lei! I tried to do that, but a horrible point is that the |
It seems sufficient to move this into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
14-30: Code duplication between methods.Lines 15-20 and 24-29 contain identical logic for saving the Python wrapper file. This was previously flagged.
tilelang/jit/adapter/kernel_cache.py (1)
19-24: Fix logging to display file path instead of object.Line 23 logs the
executableobject instead ofkernel_lib_path. The debug message should show where the file is being saved (the path), not the object being saved.🔎 Proposed fix
def _save_so_cubin_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False): kernel_lib_path = os.path.join(cache_path, self.kernel_lib_path) executable = kernel.adapter.executable if verbose: - self.logger.debug(f"Saving kernel executable to file: {executable}") + self.logger.debug(f"Saving kernel executable to file: {kernel_lib_path}") KernelCache._safe_write_executable(executable, kernel_lib_path)
🧹 Nitpick comments (3)
tilelang/jit/adapter/cutedsl/kernel_cache.py (2)
41-43: Consider iterable unpacking for clarity.Per static analysis (RUF005), prefer unpacking over concatenation for combining lists.
🔎 Suggested refactor
@override def _get_required_files(self, cache_path: str) -> list[str]: - return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)] + return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
45-47: Returning empty strings works but is subtly different fromNone.The base class's
_load_kernel_from_diskconditionif ((host_kernel_source and device_kernel_source) or self.execution_backend == "cutedsl")treats empty strings as falsy, so this works due to thecutedslbackend check. However, returning(None, None)would be more explicit and consistent with the base class's error-handling pattern that returnsNoneon failure.🔎 Suggested alternative
@override def _load_kernel_source(self, device_kernel_path: str, host_kernel_path: str, verbose: bool = False) -> tuple[str | None, str | None]: - return "", "" + return None, Nonetilelang/jit/adapter/kernel_cache.py (1)
10-11: Consider whether new override is necessary.The
__new__method simply delegates to the superclass without additional logic. While this pattern is consistent across all backend implementations in the codebase (CTypes, Cython, Torch, NVRTC), it appears redundant if no customization is needed.If this pattern serves a specific purpose (e.g., enforcing singleton behavior or future extensibility), consider adding a comment explaining the rationale. Otherwise, the method could be removed to reduce boilerplate.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
tilelang/cache/__init__.py(2 hunks)tilelang/jit/adapter/ctypes/kernel_cache.py(1 hunks)tilelang/jit/adapter/cutedsl/kernel_cache.py(1 hunks)tilelang/jit/adapter/cython/kernel_cache.py(1 hunks)tilelang/jit/adapter/kernel_cache.py(1 hunks)tilelang/jit/adapter/nvrtc/kernel_cache.py(1 hunks)tilelang/jit/adapter/torch/kernel_cache.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
tilelang/jit/adapter/cython/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/jit/adapter/cutedsl/kernel_cache.py (3)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)tilelang/jit/adapter/nvrtc/kernel_cache.py (2)
_save_kernel_source_code_to_disk(14-21)_save_so_cubin_to_disk(23-30)tilelang/jit/adapter/kernel_cache.py (1)
_save_so_cubin_to_disk(19-24)
tilelang/jit/adapter/torch/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/cache/__init__.py (5)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
CTypesKernelCache(4-6)tilelang/jit/adapter/cython/kernel_cache.py (1)
CythonKernelCache(4-6)tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
NVRTCKernelCache(7-30)tilelang/jit/adapter/torch/kernel_cache.py (1)
TorchKernelCache(4-6)tilelang/jit/adapter/kernel_cache.py (1)
TVMFFIKernelCache(7-24)
🪛 Ruff (0.14.8)
tilelang/jit/adapter/cutedsl/kernel_cache.py
43-43: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
tilelang/cache/__init__.py
61-61: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (13)
tilelang/jit/adapter/cython/kernel_cache.py (1)
4-6: Minimal adapter follows established pattern.The subclass correctly delegates to the base
KernelCache. Note that the singleton implementation in the base class stores the instance onKernelCache._instancedirectly, meaning all backend-specific subclasses share the same singleton—this appears intentional for a unified cache but is worth being aware of.tilelang/jit/adapter/torch/kernel_cache.py (1)
4-6: LGTM!The
TorchKernelCachesubclass correctly delegates to the base class and follows the consistent adapter pattern established across backends.tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
7-12: Class attributes and new are correctly defined.The
kernel_lib_path = "kernel.cubin"appropriately overrides the base class default for the NVRTC backend.tilelang/jit/adapter/cutedsl/kernel_cache.py (3)
10-17: Class attributes and structure are well-defined.The CuTeDSL-specific paths (
launcher_lib.so,launcher.cpp) are appropriately declared as class attributes.
19-21: No-op override is intentional for CuTeDSL.The CuTeDSL backend doesn't persist kernel source in the traditional manner, relying on launcher artifacts instead.
23-39: Defensive attribute access is appropriate.Good use of
getattrandhasattrto safely check for optionallib_generatorandlauncher_cpp_codeattributes before attempting to save.tilelang/jit/adapter/ctypes/kernel_cache.py (1)
4-6: LGTM!The
CTypesKernelCachesubclass follows the consistent minimal adapter pattern.tilelang/cache/__init__.py (4)
5-18: LGTM! Clean TYPE_CHECKING usage prevents circular imports.The TYPE_CHECKING conditional import pattern properly isolates the KernelCache type reference for type checking while avoiding runtime circular dependencies. All backend implementations are imported and ready for the dispatch pool.
20-28: Well-structured dispatch pool with singleton instances.The dispatch pool architecture cleanly maps backend names to their singleton KernelCache implementations. This eliminates the conditional branching that was present in the previous design.
45-59: Dispatch logic correctly routes to backend implementations.The normalization of "auto"/None to "tvm_ffi" provides a sensible default, and all parameters are properly forwarded to the selected backend's cached() method.
60-61: Exception handling addresses previous feedback.The ValueError for unsupported backends implements the behavior requested in past review comments. The inline error message is clear and actionable.
Note: Ruff suggests moving the message to an exception subclass (TRY003), but this inline approach is acceptable for a simple validation error.
The static analysis hint is optional; if you prefer to follow strict style guidelines, consider defining a custom exception class, but the current implementation is functionally correct.
tilelang/jit/adapter/kernel_cache.py (2)
7-8: LGTM! Backend-specific library path is properly defined.The TVMFFIKernelCache subclass appropriately sets
kernel_lib_path = "executable.so"for the TVM FFI backend, following the pattern used by other backend implementations (e.g., NVRTC uses "kernel.cubin").
13-17: LGTM! Wrapper kernel saving logic is correct.The method properly constructs the host kernel path, logs the correct path variable when verbose, and safely writes the host source using the parent class helper method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tilelang/cache/__init__.py (1)
37-37: Missing "cutedsl" in execution_backend type hint.The
_dispatch_poolincludes"cutedsl"(line 26), but the type hint forexecution_backendon line 37 omits it. This creates a type-checking inconsistency where the backend is available at runtime but not reflected in the type signature.- execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch"] | None = None, + execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] | None = None,Note: This issue was already flagged in a previous review comment.
🧹 Nitpick comments (1)
tilelang/cache/__init__.py (1)
58-58: Optional: Consider Ruff hint about exception message format.Ruff suggests avoiding long messages directly in exception raises (TRY003). For this simple case, the current format is clear and maintainable, so this is purely a linting preference.
If you want to strictly follow the linter suggestion:
+_UNSUPPORTED_BACKEND_MSG = 'Cannot find support for execution backend "{}"' + def cached( ... else: - raise ValueError(f'Cannot find support for execution backend "{execution_backend}"') + raise ValueError(_UNSUPPORTED_BACKEND_MSG.format(execution_backend))
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tilelang/cache/__init__.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
tilelang/cache/__init__.py
58-58: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
tilelang/cache/__init__.py (2)
20-28: Good refactoring: dispatch pool centralizes backend selection.The dispatch pool cleanly maps backend keys to singleton instances, eliminating the if-else branching mentioned in the PR objectives. This design makes adding new backends straightforward.
10-15: All imported KernelCache implementations verified and exist. The refactored dispatch system has all six required backend-specific classes properly defined in their respective modules. No issues found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tilelang/cache/__init__.py (1)
37-37: Critical: Defaultexecution_backend=Nonewill always fail.The parameter defaults to
None, butNoneis not a key in_dispatch_pool(lines 21-28). Every call tocached()without an explicitexecution_backendwill raiseValueErrorat line 58, breaking the public API.Based on past review comments, the old implementation supported
"auto"mode for backend detection. This refactor removed that logic but didn't update the default value.🔎 Recommended fixes
Option 1: Restore auto-detection for None (preserves backward compatibility)
+ # Auto-detect backend if not specified + if execution_backend is None: + # Determine the best available backend based on target + if isinstance(target, str) and target == "auto": + execution_backend = "tvm_ffi" # or implement detection logic + else: + # Parse target to determine appropriate backend + execution_backend = "tvm_ffi" # placeholder + if execution_backend in _dispatch_pool:Option 2: Use a valid default backend
- execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] | None = None, + execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] = "tvm_ffi",Option 3: Make execution_backend required
- execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] | None = None, + execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"],
🧹 Nitpick comments (2)
tilelang/cache/__init__.py (2)
20-28: Consider lazy initialization of backend instances.All backend instances are created at module import time, even if only one backend will be used. For better startup performance, consider lazy initialization:
🔎 Optional refactor for lazy initialization
-_dispatch_pool: dict[str, KernelCache] = { - "tvm_ffi": TVMFFIKernelCache(), - "ctypes": CTypesKernelCache(), - "cython": CythonKernelCache(), - "nvrtc": NVRTCKernelCache(), - "cutedsl": CuTeDSLKernelCache(), - "torch": TorchKernelCache(), -} +_dispatch_pool: dict[str, KernelCache] = {} + +def _get_backend_cache(backend: str) -> KernelCache: + """Get or create a backend cache instance.""" + if backend not in _dispatch_pool: + backend_classes = { + "tvm_ffi": TVMFFIKernelCache, + "ctypes": CTypesKernelCache, + "cython": CythonKernelCache, + "nvrtc": NVRTCKernelCache, + "cutedsl": CuTeDSLKernelCache, + "torch": TorchKernelCache, + } + if backend in backend_classes: + _dispatch_pool[backend] = backend_classes[backend]() + return _dispatch_pool.get(backend)Then update line 46 to use
_get_backend_cache(execution_backend).
57-58: Consider extracting error message to improve maintainability.Static analysis flags this as TRY003 (long message outside exception class). While the current message is clear and acceptable, extracting it can improve consistency if you add more validation errors later.
🔎 Optional refactor
+_UNSUPPORTED_BACKEND_ERROR = ( + 'Cannot find support for execution backend "{}". ' + f'Supported backends: {", ".join(_dispatch_pool.keys())}' +) + def cached( ... ) -> JITKernel: ... if execution_backend in _dispatch_pool: ... else: - raise ValueError(f'Cannot find support for execution backend "{execution_backend}"') + raise ValueError(_UNSUPPORTED_BACKEND_ERROR.format(execution_backend))
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tilelang/cache/__init__.pytilelang/jit/adapter/nvrtc/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tilelang/jit/adapter/nvrtc/kernel_cache.py
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/cache/__init__.py (1)
tilelang/cache/kernel_cache.py (2)
KernelCache(25-449)cached(113-227)
🪛 Ruff (0.14.8)
tilelang/cache/__init__.py
58-58: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
🔇 Additional comments (1)
tilelang/cache/__init__.py (1)
5-18: LGTM! Clean import structure with proper TYPE_CHECKING usage.The TYPE_CHECKING pattern correctly avoids circular import issues while maintaining type safety. The backend-specific imports are well-organized.
SiriusNEO
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, just some comments to go
2e3be15 to
1c0c714
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
1-4: Consider adding a docstring to document the placeholder implementation.The empty class body is valid but would benefit from a docstring explaining that CTypesKernelCache currently uses all default KernelCache behavior and may be extended in the future with ctypes-specific logic.
🔎 Suggested documentation
+"""CTypes backend kernel cache implementation.""" from tilelang.cache.kernel_cache import KernelCache -class CTypesKernelCache(KernelCache): ... +class CTypesKernelCache(KernelCache): + """ + Kernel cache for the ctypes execution backend. + + Currently uses default KernelCache behavior. Backend-specific + customizations can be added here as needed. + """ + ...tilelang/jit/adapter/cython/kernel_cache.py (1)
1-4: Consider adding a docstring to document the placeholder implementation.Similar to CTypesKernelCache, this empty class would benefit from documentation explaining it's a placeholder for Cython-specific cache behavior.
🔎 Suggested documentation
+"""Cython backend kernel cache implementation.""" from tilelang.cache.kernel_cache import KernelCache -class CythonKernelCache(KernelCache): ... +class CythonKernelCache(KernelCache): + """ + Kernel cache for the Cython execution backend. + + Currently uses default KernelCache behavior. Backend-specific + customizations can be added here as needed. + """ + ...tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
41-43: Consider iterable unpacking for concatenation.The list concatenation works correctly but could use iterable unpacking for slightly better performance.
🔎 Optional optimization
def _get_required_files(self, cache_path: str) -> list[str]: - return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)] + return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
tilelang/cache/__init__.pytilelang/cache/kernel_cache.pytilelang/jit/adapter/ctypes/kernel_cache.pytilelang/jit/adapter/cutedsl/kernel_cache.pytilelang/jit/adapter/cython/kernel_cache.pytilelang/jit/adapter/kernel_cache.pytilelang/jit/adapter/nvrtc/kernel_cache.pytilelang/jit/adapter/torch/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tilelang/jit/adapter/torch/kernel_cache.py
- tilelang/jit/adapter/kernel_cache.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
tilelang/cache/__init__.py
🧬 Code graph analysis (4)
tilelang/jit/adapter/cython/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-475)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-475)
tilelang/jit/adapter/nvrtc/kernel_cache.py (2)
tilelang/cache/kernel_cache.py (4)
KernelCache(25-475)_save_so_cubin_to_disk(412-417)_safe_write_file(253-260)_load_binary(247-250)tilelang/jit/adapter/kernel_cache.py (1)
_save_so_cubin_to_disk(16-21)
tilelang/cache/kernel_cache.py (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (6)
_set_adapter_cache_path(50-52)_save_kernel_source_code_to_disk(20-21)_save_so_cubin_to_disk(24-39)_get_required_files(42-43)_load_kernel_source(46-47)_build_kernel(55-83)
🪛 Ruff (0.14.10)
tilelang/cache/__init__.py
58-58: Avoid specifying long messages outside the exception class
(TRY003)
tilelang/jit/adapter/cutedsl/kernel_cache.py
43-43: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
tilelang/cache/kernel_cache.py
443-443: Unused method argument: kernel
(ARG002)
443-443: Unused method argument: cache_path
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (14)
tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
7-18: LGTM! Clean NVRTC-specific implementation.The class correctly customizes paths for NVRTC's cubin output and implements the necessary logic to persist the Python wrapper alongside the binary. The previous duplicate logic concern has been addressed.
tilelang/cache/__init__.py (3)
21-28: Clean dispatch mechanism successfully replaces singleton pattern.The backend dispatch map provides a clear, extensible way to route kernel cache operations to backend-specific implementations. This aligns with the PR objective to eliminate if-else branching.
37-37: Good choice of default execution backend.Setting the default to
"tvm_ffi"addresses previous review concerns aboutNonecausing errors. This provides a sensible fallback while still allowing explicit backend selection.Based on past review comments, this follows the suggestion to use "tvm_ffi" as the default for internal APIs.
45-58: Dispatch logic is straightforward and correct.The if-else routing to backend-specific cache instances is clear. The ValueError provides a helpful message for unsupported backends.
Note: The static analysis hint (TRY003) about the long error message is a stylistic suggestion. The current inline message is acceptable and clear for this use case.
tilelang/jit/adapter/cutedsl/kernel_cache.py (4)
13-21: CuTeDSL-specific paths and no-op kernel source save are appropriate.The custom paths for launcher artifacts align with CuTeDSL's C++ launcher design. The early return in
_save_kernel_source_code_to_diskindicates CuTeDSL handles kernel sources differently than other backends.
23-39: Defensive attribute checking for launcher artifacts is good practice.The use of
getattr()andhasattr()ensures the code handles cases where the adapter or lib_generator may not have the expected launcher attributes. The verbose logging aids debugging.
45-52: Appropriate overrides for CuTeDSL's kernel loading behavior.Returning empty strings from
_load_kernel_sourceand propagating the cache path to the adapter reflect CuTeDSL's different approach to managing kernel sources and artifacts.
54-83: Correct kernel reconstruction from cached parameters.The
_build_kernelimplementation properly delegates toJITKernel.from_databasewhen parameters are available, which is the expected behavior for loading cached kernels.tilelang/cache/kernel_cache.py (6)
39-43: Excellent use of class attributes for path customization.Moving path constants from module-level to class attributes allows backend-specific subclasses to override paths as needed (e.g., NVRTC using "kernel.cubin" instead of "kernel_lib.so"). This is a key enabler of the inheritance-based design.
290-303: Well-structured delegation to specialized save methods.Breaking
_save_kernel_to_diskinto separate calls to_save_kernel_source_code_to_disk,_save_wrapper_kernel_code_to_disk, and_save_so_cubin_to_diskprovides clear override points for backend-specific behavior while maintaining error handling at the orchestration level.
351-380: Clean refactor of disk loading logic with extensible helper methods.The extraction of
_get_required_files,_load_kernel_source, and_build_kernelsuccessfully eliminates conditional logic while enabling backends like CuTeDSL to customize file requirements and loading behavior.
399-441: New helper methods provide excellent override points for subclasses.These methods establish the template method pattern, allowing each backend to customize specific aspects:
_save_kernel_source_code_to_disk: Device kernel persistence_save_wrapper_kernel_code_to_disk: Host wrapper persistence_save_so_cubin_to_disk: Library/binary persistence_get_required_files: File existence checks_load_kernel_source: Source code loadingThis design successfully achieves the PR objective of eliminating if-else branching in favor of polymorphism.
443-444: Hook method with unused parameters is intentional design.The
_set_adapter_cache_pathmethod serves as an extension point for subclasses (e.g., CuTeDSLKernelCache overrides this at line 50-52 of tilelang/jit/adapter/cutedsl/kernel_cache.py to propagate the cache path). The unused parameters in the base implementation are expected for hook methods.The static analysis warnings for unused arguments (ARG002) are false positives in this context.
446-475: Flexible kernel reconstruction logic supports diverse backends.The
_build_kernelmethod provides a sensible default that works for most backends while allowing specialized subclasses (like CuTeDSL) to implement custom reconstruction logic based on available artifacts.
|
I have found a perfect solution for testing cache utilities. Wait for my push. |
|
My new test successfully detected the failure of cutedsl disk cache. I will push the fix together with the test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
maint/scripts/run_local_ci_test.sh (1)
18-22: Critical inconsistency: Comment requires separate CuTeDSL invocation, but code runs unified test.The comment explicitly states "Run the CuTeDSL JIT tests in a separate pytest invocation" with
TILELANG_USE_GEMM_V1=1, yet Line 22 runs all tests in a single unified command without isolating CuTeDSL. Per the PR objectives, reviewer lucifer1004 specifically requested running cutedsl separately with this environment variable set.Either:
- Add the separate CuTeDSL test invocation as documented, or
- Remove/update the comment if the kernel cache refactor eliminated this requirement.
🔎 Proposed fix: Add separate CuTeDSL test invocation
If CuTeDSL still requires
TILELANG_USE_GEMM_V1=1, implement the separate invocation as indicated:# Run pytest in parallel (4 workers) for all tests in the testing/python directory. -# IMPORTANT: CuTeDSL backend currently requires GEMM v1 (TILELANG_USE_GEMM_V1=1). -# Do NOT export it globally here, or you'll silently change the default GEMM selection -# for unrelated tests. Run the CuTeDSL JIT tests in a separate pytest invocation. cd testing/python -python -m pytest -n 4 . --verbose --color=yes --durations=0 --showlocals --cache-clear +python -m pytest -n 4 . --verbose --color=yes --durations=0 --showlocals --cache-clear --ignore=jit/test_tilelang_jit_gemm_cutedsl.py + +# Run CuTeDSL JIT tests separately with GEMM v1 enabled. +# IMPORTANT: Do NOT export TILELANG_USE_GEMM_V1 globally, or you'll silently change +# the default GEMM selection for unrelated tests. +TILELANG_USE_GEMM_V1=1 python -m pytest -n 4 jit/test_tilelang_jit_gemm_cutedsl.py --verbose --color=yes --durations=0 --showlocals --cache-clearBased on PR objectives, verify with lucifer1004 whether the separate invocation is still required after this refactor.
🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
42-44: LGTM - Extends required files for CuTeDSL.Adding the launcher library to required files ensures cache validation checks for all necessary artifacts.
The static analysis suggests using iterable unpacking (
[*super()._get_required_files(cache_path), ...]), but the current concatenation is equally clear and readable for this use case.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
.github/workflows/ci.ymlmaint/scripts/run_local_ci_test.shtesting/python/cache/test_tilelang_kernel_cache.pytesting/python/jit/test_tilelang_jit_cutedsl.pytesting/python/jit/test_tilelang_jit_nvrtc.pytilelang/jit/adapter/cutedsl/adapter.pytilelang/jit/adapter/cutedsl/kernel_cache.pytilelang/jit/adapter/cutedsl/libgen.py
💤 Files with no reviewable changes (3)
- testing/python/jit/test_tilelang_jit_nvrtc.py
- .github/workflows/ci.yml
- tilelang/jit/adapter/cutedsl/libgen.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
maint/scripts/run_local_ci_test.shtesting/python/cache/test_tilelang_kernel_cache.pytesting/python/jit/test_tilelang_jit_cutedsl.py
🧬 Code graph analysis (2)
tilelang/jit/adapter/cutedsl/adapter.py (1)
tilelang/jit/adapter/cython/adapter.py (1)
libpath(366-368)
tilelang/jit/adapter/cutedsl/kernel_cache.py (2)
tilelang/cache/kernel_cache.py (8)
KernelCache(25-472)_save_kernel_source_code_to_disk(395-400)_save_so_cubin_to_disk(408-413)_safe_write_file(248-255)_get_required_files(415-418)_load_kernel_source(420-437)_set_adapter_cache_path(439-440)_build_kernel(442-472)tilelang/jit/adapter/kernel_cache.py (1)
_save_so_cubin_to_disk(16-21)
🪛 Ruff (0.14.10)
testing/python/cache/test_tilelang_kernel_cache.py
189-189: Unused function argument: clean_cache_env
(ARG001)
tilelang/jit/adapter/cutedsl/kernel_cache.py
44-44: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (15)
testing/python/jit/test_tilelang_jit_cutedsl.py (1)
7-9: LGTM - Module-level environment configuration for CuTeDSL tests.Setting
TILELANG_USE_GEMM_V1 = "1"at module level ensures the GEMM V1 pathway is enabled for all CuTeDSL JIT kernel tests in this file. This aligns with the PR's goal of testing the refactored cache behavior.tilelang/jit/adapter/cutedsl/adapter.py (2)
93-94: Good improvement: Using context manager for file I/O.Using
with open()ensures proper resource cleanup. This prevents potential file handle leaks that could occur with the previous directopen().read()pattern.
150-151: LGTM - Consistent context manager usage infrom_database.Matches the pattern used in
__init__for proper file handle management.testing/python/cache/test_tilelang_kernel_cache.py (7)
1-21: Excellent test documentation.The test plan clearly explains the strategy, CI safety considerations, and technical details. This makes the test suite maintainable and helps future contributors understand the purpose and constraints.
45-63: LGTM - Clean callback tracking implementation.The
PostProcCounterclass effectively tracks postproc invocations and stores markers for verification. The closure-based callback registration is appropriate for this use case.
82-82: Consider handling missingbackendparameter gracefully.If a test using this fixture doesn't have a
backendparameter incallspec.params, this will raise aKeyError. While all current tests are parametrized withbackend, using.get()with a default makes this more robust:backend = request.node.callspec.params.get("backend", "nvrtc")Wait, I see this is already using
.get()with a default. The code is correct.
108-185: Well-structured cache verification test.The test effectively validates:
- Cache miss triggers postproc (counter increments)
- Cache hit skips postproc (counter unchanged)
- Marker preservation across cache load
- Functional correctness of cached kernels
The UUID-based cache key isolation ensures test independence across runs.
188-235: LGTM - Validates cache key differentiation.This test confirms that different kernel implementations produce different cache keys, ensuring the test methodology is sound.
The static analysis hint about
clean_cache_envbeing unused is a false positive—the fixture is required for its side effects (setting up isolated cache directories and environment variables), which is a standard pytest pattern.
238-273: LGTM - Meta-test for fixture isolation.This test validates that the
clean_cache_envfixture provides proper test isolation, which is essential for reliable CI runs.
276-282: Good test cleanup practice.The
autouse=Truefixture ensures postproc callbacks are restored to identity functions after each test, preventing test pollution across the suite.tilelang/jit/adapter/cutedsl/kernel_cache.py (5)
13-18: LGTM - Clear path constant definitions.The class properly defines CuTeDSL-specific artifact paths. Using class-level constants makes the file structure explicit and maintainable.
20-22: Intentional no-op for CuTeDSL source saving.CuTeDSL doesn't separately save kernel source code since the source is embedded in the generated Python library (
kernel.py). This is consistent with the adapter's behavior inadapter.pywheredevice_kernel_sourceis read fromlibpath.
24-40: LGTM - Robust launcher artifact saving.The implementation:
- Uses defensive
getattr/hasattrchecks for optional attributes- Leverages
_safe_write_filefor atomic writes (important for concurrent access)- Optionally saves C++ source for debugging purposes
50-53: LGTM - Enables delayed cubin caching.This propagates the cache path to the adapter, enabling
_save_cubin_to_cache_if_needed(adapter.py line 230) to save the cubin after first execution. This is necessary because the cubin may not exist until the kernel is actually run.
55-85: LGTM - Proper kernel construction from cached data.The implementation correctly:
- Delegates to
JITKernel.from_databasewith all necessary parameters- Returns
Nonewhenkernel_paramsis unavailable, signaling a failed cache load- Passes through all configuration options (
pass_configs,compile_flags, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
59-88: Restore source validation to prevent invalid kernels from being constructed.The base class
_build_kernelvalidates that bothhost_kernel_sourceanddevice_kernel_sourceare non-empty before callingJITKernel.from_database(seetilelang/jit/adapter/kernel_cache.pyline 454:if host_kernel_source and device_kernel_source and kernel_params:). This override only checksif kernel_params:(line 73), which allows empty or None sources to reachJITKernel.from_databasewithout validation. This issue is directly related to the existing review comment about_load_kernel_sourcereturn values.If
_load_kernel_sourceis updated to return(None, None)on errors (as it should per the previous comment), this validation gap becomes critical.🔎 Proposed fix
@override def _build_kernel( self, func: Callable | None, host_kernel_source: str, device_kernel_source: str, kernel_lib_path: str, kernel_params: list[KernelParam] | None, target: str | Target, target_host: str | Target | None, out_idx: list[int] | None, execution_backend: Literal["tvm_ffi", "cython", "nvrtc", "torch", "cutedsl"], pass_configs: dict | None, compile_flags: list[str] | str | None, ) -> JITKernel | None: - if kernel_params: + if host_kernel_source and device_kernel_source and kernel_params: return JITKernel.from_database( func=func, host_kernel_source=host_kernel_source, device_kernel_source=device_kernel_source, kernel_lib_path=kernel_lib_path, params=kernel_params, target=target, target_host=target_host, out_idx=out_idx, execution_backend=execution_backend, pass_configs=pass_configs, compile_flags=compile_flags, ) else: return None
🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
44-45: Consider using iterable unpacking for better performance.The list concatenation can be replaced with iterable unpacking for a minor performance improvement.
🔎 Proposed refactor
def _get_required_files(self, cache_path: str) -> list[str]: - return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)] + return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tilelang/jit/adapter/cutedsl/adapter.pytilelang/jit/adapter/cutedsl/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tilelang/jit/adapter/cutedsl/adapter.py
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
tilelang/jit/adapter/kernel_cache.py (1)
_save_so_cubin_to_disk(16-21)
🪛 Ruff (0.14.10)
tilelang/jit/adapter/cutedsl/kernel_cache.py
45-45: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
43-45: Consider iterable unpacking per static analysis hint.The current concatenation works correctly. If you prefer the more modern style, you could use:
return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]However, the current form is equally clear and readable.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/tl_templates/cuda/gemm_mma.htilelang/jit/adapter/cutedsl/kernel_cache.py
🧰 Additional context used
🪛 Ruff (0.14.10)
tilelang/jit/adapter/cutedsl/kernel_cache.py
45-45: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
🔇 Additional comments (5)
src/tl_templates/cuda/gemm_mma.h (1)
53-53: LGTM! Float dispatch added consistently across modern architectures.The new
TL_DISPATCH_MMA(float, float, float, SM80_16x8x8_F32TF32TF32F32_TN)specializations enable FP32 input/output GEMM operations across SM80+ architectures. The implementation correctly leverages tensor core TF32 instructions (note lines 273-277 internally convertfloattotfloat32_tfor computation). The omission from the SM75 branch (line 106) is appropriate since SM75 lacks TF32 support.Also applies to: 67-67, 80-80, 93-93, 102-102
tilelang/jit/adapter/cutedsl/kernel_cache.py (4)
15-19: LGTM! Path attributes are clearly defined for CuTeDSL backend.The unified "kernel.py" path for all kernel sources aligns with CuTeDSL's single-file architecture, and the launcher-specific paths (
launcher_lib.so,launcher.cpp) properly encapsulate CuTeDSL's C++ launcher requirements.
25-41: LGTM! Launcher artifact persistence is correctly implemented.The method properly guards attribute access, uses the base class's safe I/O helpers (
_safe_write_file,_load_binary), and includes appropriate verbose logging. The separation of required artifacts (launcher library) from optional debugging artifacts (launcher C++ source) is well-designed.
47-50: LGTM! Cache path propagation is correctly implemented.The
hasattrcheck provides appropriate defensive programming, and setting_cache_pathon the adapter enables it to locate cached launcher artifacts during kernel loading.
21-23: Remove verification suggestion—the no-op is intentional design.CuTeDSL kernels do not require separate source code caching. The kernel source is embedded in the compiled library and reconstructed as needed (see adapter.py line 94). The adapter saves compiled artifacts (launcher library) via
_save_so_cubin_to_disk, not source code files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
tilelang/cache/kernel_cache.py (2)
402-406: Add null check forget_kernel_source()return value before writing.This issue was previously flagged: some adapters (CuTeDSLKernelAdapter, NVRTCKernelAdapter) can return
Nonefromget_kernel_source(), which would cause aTypeErrorwhen passed tofile.write(). Add an explicit check:source = kernel.adapter.get_kernel_source() if source is None: if verbose: self.logger.debug("Skipping host kernel source save: adapter returned None") return KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(source))🔎 Proposed fix
def _save_wrapper_kernel_code_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False): host_kernel_path = os.path.join(cache_path, self.host_kernel_path) + source = kernel.adapter.get_kernel_source() + if source is None: + if verbose: + self.logger.debug("Skipping host kernel source save: adapter returned None") + return if verbose: self.logger.debug(f"Saving wrapped kernel source code to file: {host_kernel_path}") - KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(kernel.adapter.get_kernel_source())) + KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(source))
456-467: Use explicitis Nonechecks instead of truthiness to distinguish missing from empty values.This issue was previously flagged: the validation uses truthiness checks (
if not host_kernel_source) that incorrectly treat empty strings as missing. Empty strings are valid (indicating an empty but present file), whileNoneindicates a load failure.The current logic would incorrectly reject backends that legitimately return empty source strings. Use explicit None checks:
missing_components = [] if host_kernel_source is None: missing_components.append("host_kernel_source") if device_kernel_source is None: missing_components.append("device_kernel_source") if kernel_params is None: missing_components.append("kernel_params")🔎 Proposed fix
# Check all required components and report specific failures missing_components = [] - if not host_kernel_source: + if host_kernel_source is None: missing_components.append("host_kernel_source") - if not device_kernel_source: + if device_kernel_source is None: missing_components.append("device_kernel_source") - if not kernel_params: + if kernel_params is None: missing_components.append("kernel_params")
🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
40-42: Consider iterable unpacking for cleaner list concatenation.The current implementation concatenates lists using
+, which works correctly. However, Python's iterable unpacking syntax is more idiomatic and slightly cleaner:return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]This is purely a style improvement with no functional difference.
Based on static analysis hints (Ruff RUF005).
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tilelang/cache/kernel_cache.pytilelang/jit/adapter/cutedsl/kernel_cache.py
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (2)
tilelang/cache/kernel_cache.py (7)
KernelCache(25-481)_save_kernel_source_code_to_disk(395-400)_save_so_cubin_to_disk(408-413)_safe_write_file(248-255)_load_binary(242-245)_get_required_files(415-418)_set_adapter_cache_path(439-440)tilelang/jit/adapter/kernel_cache.py (1)
_save_so_cubin_to_disk(16-21)
🪛 Ruff (0.14.10)
tilelang/cache/kernel_cache.py
439-439: Unused method argument: kernel
(ARG002)
439-439: Unused method argument: cache_path
(ARG002)
tilelang/jit/adapter/cutedsl/kernel_cache.py
42-42: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
🔇 Additional comments (5)
tilelang/cache/kernel_cache.py (5)
39-42: Well-designed refactor supporting inheritance-based backend dispatch.Converting path constants to instance attributes enables subclasses to override backend-specific paths (e.g., CuTeDSL using
"kernel.py"instead of"device_kernel.cu"), which directly supports the PR objective of eliminating if-else backend branching.
215-215: Good use of polymorphism for backend-specific adapter cache path setup.Replacing direct attribute manipulation with the
_set_adapter_cache_pathhook allows subclasses to control adapter state without base class assumptions about adapter internals.
284-301: Clean refactor delegating to overridable helpers.Breaking disk saves into separate overridable methods (
_save_kernel_source_code_to_disk,_save_wrapper_kernel_code_to_disk,_save_so_cubin_to_disk) with per-method error handling enables fine-grained backend customization while keeping error boundaries clear.
420-437: Robust error handling for optional kernel source files.The method correctly returns
(None, None)when source files are missing or unreadable, allowing callers to distinguish between "source not available" and other failure modes. This supports backends where source files are optional.
115-120: Update docstring to match the method signature:targetandverboseno longer acceptNonevalues.The docstring for the
cached()method inKernelCache(lines 131-134) incorrectly claims thattargetandverboseacceptNoneto trigger environment variable reads. However, the actual method signature makes these parameters required (target: str | Targetandverbose: boolwith no defaults). The environment variable handling has been moved to the publiccached()wrapper intilelang/cache/__init__.py, which accepts optional values and properly resolves them before passing explicit values to the internal method. Update the docstring to clarify thattargetandverboseare now required parameters, and remove references toNonedefaults and environment variable fallbacks.
LeiWang1999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Awesome contributions!
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
testing/python/cache/test_tilelang_kernel_cache.pytesting/python/jit/test_tilelang_jit_cutedsl.pytilelang/jit/adapter/cutedsl/adapter.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
testing/python/cache/test_tilelang_kernel_cache.pytesting/python/jit/test_tilelang_jit_cutedsl.py
🧬 Code graph analysis (2)
tilelang/jit/adapter/cutedsl/adapter.py (1)
tilelang/jit/adapter/cython/adapter.py (1)
libpath(366-368)
testing/python/cache/test_tilelang_kernel_cache.py (2)
tilelang/env.py (1)
enable_cache(275-276)tilelang/jit/adapter/cutedsl/adapter.py (1)
get_kernel_source(205-213)
🪛 Ruff (0.14.10)
testing/python/cache/test_tilelang_kernel_cache.py
202-202: Unused function argument: clean_cache_env
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (3)
tilelang/jit/adapter/cutedsl/adapter.py (1)
93-95: > Likely an incorrect or invalid review comment.testing/python/cache/test_tilelang_kernel_cache.py (2)
1-87: Excellent test infrastructure design.The test plan documentation (lines 1-21) clearly explains the cache verification strategy, and the fixture design properly isolates tests while ensuring correct environment restoration:
setup_module_envcorrectly saves original values before any modificationsclean_cache_envprovides isolated cache directories per test- Callback cleanup ensures no cross-test contamination
This is a solid foundation for comprehensive cache testing across multiple backends.
124-287: Comprehensive cache behavior validation.The three test scenarios effectively verify:
- Cache miss/hit detection via postproc callback tracking
- Cache key differentiation for distinct kernel implementations
- Cache isolation between test runs
The use of UUID-based unique identifiers in
global_symbolensures reliable cache key generation without collisions.Note: The static analysis hint claiming
clean_cache_envis unused intest_cache_miss_detectionis a false positive—the fixture is properly used for its side effects (cache isolation and env setup).
|
Merged, thanks for your awesome contributions! |
This PR resolves #1469. It moves most of the
if-elsecheckers for multi-backends intilelang/cache/kernel_cache.py, and dispatches the different implementations by inheritance in different subclasses.We retained some specialized implementations for
CuteDSLand considered removing them in the future once this backend is ready.Summary by CodeRabbit
New Features
Changes
Tests / CI
✏️ Tip: You can customize this high-level summary in your review settings.