Skip to content

Conversation

@clouds56
Copy link
Contributor

@clouds56 clouds56 commented Dec 24, 2025

Sorry last PR #1527 was closed by mistake, and my branch is also lost, so I prepared a new PR.

Summary by CodeRabbit

  • Improvements

    • Further improved CUDA installation path detection: now probes NVIDIA-related Python packages and adds platform-specific fallbacks on Windows and Unix-like systems to more reliably locate CUDA installations when automatic detection previously failed.
  • New Features

    • Added a new optional "nvcc" install group to simplify installing CUDA tooling, including nvcc and related helper packages.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 24, 2025

📝 Walkthrough

Walkthrough

Adds two CUDA-home discovery fallbacks in tilelang/env.py (inspect NVIDIA-related Python packages; platform-specific filesystem paths) and a new optional nvcc dependency group in pyproject.toml. No public API signature changes. (47 words)

Changes

Cohort / File(s) Summary
CUDA Home Discovery Enhancement
tilelang/env.py
Adds Guess #3: detect CUDA home by inspecting NVIDIA-related Python packages (e.g., nvidia.cu13, nvidia.cu12, nvidia.cu11, nvidia.cuda_nvcc) via importlib.util.find_spec and use the package location when applicable. Adds Guess #4: platform-specific filesystem fallbacks — Windows: search C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v*.*; Non-Windows: check /usr/local/cuda and /opt/nvidia/hpc_sdk/.... These runs after prior guesses and before existing defaults; invalid paths normalize to "".
Optional Dependency
pyproject.toml
Adds new optional-dependencies group nvcc with nvidia-cuda-nvcc>=13.0.48 and nvidia-cuda-cccl>=13.0.50. No other dependency or public API changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • LeiWang1999

Poem

🐰 I sniffed through packages, paths both near and far,

Found nvcc hiding where binaries are.
I nudged pyproject, added a tiny seed,
Now GPUs wake up when builders take the lead. ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding detection of CUDA_HOME from the nvidia-cuda-nvcc package, which aligns with the primary objective of improving CUDA installation autodetection.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfe5550 and f299e7b.

📒 Files selected for processing (1)
  • tilelang/env.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:27.444Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.
📚 Learning: 2025-12-24T17:20:27.444Z
Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:27.444Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.

Applied to files:

  • tilelang/env.py
🔇 Additional comments (1)
tilelang/env.py (1)

53-53: Good fix for empty CUDA_HOME handling.

The addition of or None correctly treats empty string values from environment variables as non-present, preventing them from being used as invalid CUDA paths. This aligns with the commit message and addresses the reported issue where CUDA_HOME might be set to an empty string.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tilelang/env.py (1)

91-93: Non-deterministic CUDA version selection when multiple versions are installed.

glob.glob() returns paths in arbitrary filesystem order. If multiple CUDA versions are installed (e.g., v11.8, v12.0, v12.4), selecting cuda_homes[0] gives unpredictable results across runs or machines.

🔎 Proposed fix to prefer the latest CUDA version
         if sys.platform == "win32":
             cuda_homes = glob.glob("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v*.*")
-            cuda_home = "" if len(cuda_homes) == 0 else cuda_homes[0]
+            if cuda_homes:
+                # Sort to prefer the latest version (e.g., v12.4 over v11.8)
+                cuda_homes.sort(reverse=True)
+                cuda_home = cuda_homes[0]
+            else:
+                cuda_home = ""
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d140415 and e801a01.

📒 Files selected for processing (1)
  • tilelang/env.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:27.444Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.
📚 Learning: 2025-12-24T17:20:27.444Z
Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:27.444Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.

Applied to files:

  • tilelang/env.py
🔇 Additional comments (2)
tilelang/env.py (2)

69-87: LGTM - PyPI package detection correctly implements nvidia-cuda-nvcc paths.

The candidate paths (nvidia/cu13/bin/, nvidia/cu12/bin/, nvidia/cu11/bin/) correctly match the nvidia-cuda-nvcc PyPI package installation structure. The priority order prefers newer CUDA versions, and the fallback to nvidia/cuda_nvcc/bin/ is a reasonable defensive addition. Based on learnings from the previous PR.


98-99: Verify HPC SDK fallback path is intentional.

The path /opt/nvidia/hpc_sdk/Linux_x86_64 is the HPC SDK root, not a valid CUDA_HOME. The actual CUDA installation within HPC SDK is typically nested at /opt/nvidia/hpc_sdk/Linux_x86_64/<version>/cuda/<cuda_version>.

This differs from Guess #2's handling (lines 62-64), which navigates from the nvcc binary to find the correct SDK path. If this fallback is reached, it's unlikely to point to a usable CUDA installation—though the validation at lines 101-103 will safely reject it.

Is this a best-effort guess that relies on the subsequent validation, or should this path be removed/adjusted for consistency?

Copy link
Collaborator

@oraluben oraluben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idea LGTM, with this PR, tilelang can work with a plain torch installation (with nvidia-cuda-nvcc installed), without requiring cuda toolchain.

I think we can also put nvidia-cuda-nvcc in optional dependency.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tilelang/env.py (2)

69-78: Move import to module level for consistency.

The importlib.util import is placed inside the function. Consider moving it to the top of the file alongside other imports for better consistency and readability.

🔎 Proposed refactor

At the top of the file, add the import:

from __future__ import annotations
import sys
import os
import pathlib
import logging
import shutil
import glob
+import importlib.util
from dataclasses import dataclass

Then remove it from inside the function:

     if cuda_home is None:
         # Guess #3
         # from pypi package nvidia-cuda-nvcc, nvidia-cuda-nvcc-cu12, etc.
-        import importlib.util
-
         for submodule in ["cu13", "cu12", "cu11", "cuda_nvcc"]:

82-84: Consider sorting CUDA versions when multiple installations exist.

The Windows fallback uses glob.glob which returns matches in arbitrary filesystem order. If multiple CUDA versions are installed, the selected version may be unpredictable. Consider sorting the results to select the latest version consistently.

🔎 Proposed fix to select the latest CUDA version
         if sys.platform == "win32":
             cuda_homes = glob.glob("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v*.*")
-            cuda_home = "" if len(cuda_homes) == 0 else cuda_homes[0]
+            if len(cuda_homes) == 0:
+                cuda_home = ""
+            else:
+                # Sort to get the latest version (e.g., v12.8 comes after v11.8)
+                cuda_homes.sort(reverse=True)
+                cuda_home = cuda_homes[0]
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e801a01 and 5d53b22.

📒 Files selected for processing (2)
  • pyproject.toml
  • tilelang/env.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:27.444Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.
📚 Learning: 2025-12-24T17:20:27.444Z
Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:27.444Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.

Applied to files:

  • pyproject.toml
  • tilelang/env.py
🔇 Additional comments (4)
tilelang/env.py (3)

86-90: LGTM!

The Linux/macOS fallback logic correctly checks standard CUDA installation paths in a reasonable priority order.


92-96: LGTM!

The path validation logic correctly ensures that only existing CUDA installations are returned, with proper fallback to an empty string for backward compatibility.


74-78: The CUDA_HOME detection from PyPI packages at lines 74-78 is correct. The spec.submodule_search_locations[0] returns the nvidia/cu13 (or cu12/cu11) directory, which is the proper CUDA_HOME location containing the bin/ subdirectory with nvcc. No changes needed.

pyproject.toml (1)

55-59: The specified versions for both optional dependencies are valid and available on PyPI:

  • nvidia-cuda-nvcc 13.0.48 ✓ exists (released Aug 4, 2025)
  • nvidia-cuda-cccl 13.0.50 ✓ exists (released Aug 4, 2025)

No known security vulnerabilities were found for these versions.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
tilelang/env.py (2)

93-95: Consider simplifying the validation logic.

The validation can be streamlined since checking cuda_home is None and then setting it to None is redundant.

🔎 Proposed simplification
-        # Validate found path
-        if cuda_home is None or not os.path.exists(cuda_home):
-            cuda_home = None
+        # Validate found path exists
+        if cuda_home and not os.path.exists(cuda_home):
+            cuda_home = None

72-72: Consider moving the import to the module level for better clarity.

While importing importlib.util inside the function works, placing it at the module level (lines 1-8) would improve code organization and make dependencies more visible.

🔎 Proposed change

At the top of the file (after line 7):

 import shutil
 import glob
+import importlib.util
 from dataclasses import dataclass

Then remove the import from line 72:

     if cuda_home is None:
         # Guess #3
         # from pypi package nvidia-cuda-nvcc, nvidia-cuda-nvcc-cu12, etc.
-        import importlib.util
 
         if importlib.util.find_spec("nvidia") is not None:
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d53b22 and dfe5550.

📒 Files selected for processing (1)
  • tilelang/env.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:27.444Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.
📚 Learning: 2025-12-24T17:20:27.444Z
Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:27.444Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.

Applied to files:

  • tilelang/env.py
🔇 Additional comments (1)
tilelang/env.py (1)

69-79: The fix correctly prevents exceptions when nvidia packages are not installed.

The guard clause at line 74 (if importlib.util.find_spec("nvidia") is not None:) successfully prevents accessing nvidia submodules when the parent package is missing. Testing confirms that importlib.util.find_spec("nvidia") returns None without raising an exception when the package is not installed, and the conditional structure ensures submodule searches (lines 75-79) never execute in this scenario.

Comment on lines +69 to +79
if cuda_home is None:
# Guess #3
# from pypi package nvidia-cuda-nvcc, nvidia-cuda-nvcc-cu12, etc.
import importlib.util

if importlib.util.find_spec("nvidia") is not None:
for submodule in ["cu13", "cu12", "cu11", "cuda_nvcc"]:
spec = importlib.util.find_spec(f"nvidia.{submodule}")
if spec is not None and spec.submodule_search_locations:
cuda_home = os.path.join(spec.submodule_search_locations[0])
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

python3 << 'EOF'
import importlib.util
import os
import sys

print("=== Checking nvidia package structure ===\n")

if importlib.util.find_spec("nvidia") is not None:
    print("nvidia package found\n")
    for submodule in ["cu13", "cu12", "cu11", "cuda_nvcc"]:
        spec = importlib.util.find_spec(f"nvidia.{submodule}")
        if spec is not None:
            print(f"nvidia.{submodule}:")
            print(f"  spec.origin: {spec.origin}")
            print(f"  spec.submodule_search_locations: {spec.submodule_search_locations}")
            
            if spec.submodule_search_locations:
                package_path = spec.submodule_search_locations[0]
                print(f"  Package directory: {package_path}")
                
                # Check if bin/nvcc exists
                nvcc_path = os.path.join(package_path, "bin", "nvcc")
                nvcc_exe_path = os.path.join(package_path, "bin", "nvcc.exe")
                
                print(f"  bin/nvcc exists: {os.path.isfile(nvcc_path)}")
                print(f"  bin/nvcc.exe exists: {os.path.isfile(nvcc_exe_path)}")
                
                # List top-level contents
                if os.path.isdir(package_path):
                    try:
                        contents = sorted(os.listdir(package_path))
                        print(f"  Contents ({len(contents)} items): {contents}")
                    except Exception as e:
                        print(f"  Error listing contents: {e}")
            print()
        else:
            print(f"nvidia.{submodule}: NOT FOUND\n")
else:
    print("nvidia package not found - installing test package...")
    import subprocess
    subprocess.run([sys.executable, "-m", "pip", "install", "-q", "nvidia-cuda-nvcc-cu11"], check=False)
    print("\nRetrying package detection...\n")
    
    if importlib.util.find_spec("nvidia") is not None:
        for submodule in ["cu13", "cu12", "cu11", "cuda_nvcc"]:
            spec = importlib.util.find_spec(f"nvidia.{submodule}")
            if spec is not None:
                print(f"nvidia.{submodule}:")
                print(f"  spec.submodule_search_locations: {spec.submodule_search_locations}")
                if spec.submodule_search_locations:
                    package_path = spec.submodule_search_locations[0]
                    if os.path.isdir(package_path):
                        contents = sorted(os.listdir(package_path))
                        print(f"  Contents: {contents}")
                print()
EOF

Repository: tile-ai/tilelang

Length of output: 146


🏁 Script executed:

cd tilelang && grep -n "cuda_home" env.py | head -20

Repository: tile-ai/tilelang

Length of output: 1061


🏁 Script executed:

cd tilelang && sed -n '50,100p' env.py

Repository: tile-ai/tilelang

Length of output: 2166


Remove invalid "cuda_nvcc" submodule and validate detected CUDA installation contains nvcc binary.

The code includes "cuda_nvcc" in the submodule list, but per the nvidia-cuda-nvcc package structure, only "cu13", "cu12", and "cu11" are valid submodules. The "cuda_nvcc" entry should be removed.

Additionally, while the code has an os.path.exists(cuda_home) check at line 94, it only verifies the directory exists, not that it contains a usable CUDA installation. For robustness against corrupted or incomplete packages, check that bin/nvcc (or bin/nvcc.exe on Windows) exists before setting cuda_home:

         if importlib.util.find_spec("nvidia") is not None:
-            for submodule in ["cu13", "cu12", "cu11", "cuda_nvcc"]:
+            for submodule in ["cu13", "cu12", "cu11"]:
                 spec = importlib.util.find_spec(f"nvidia.{submodule}")
                 if spec is not None and spec.submodule_search_locations:
-                    cuda_home = os.path.join(spec.submodule_search_locations[0])
-                    break
+                    candidate = spec.submodule_search_locations[0]
+                    nvcc_path = os.path.join(candidate, "bin", "nvcc")
+                    if sys.platform == "win32":
+                        nvcc_path += ".exe"
+                    if os.path.isfile(nvcc_path):
+                        cuda_home = candidate
+                        break
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if cuda_home is None:
# Guess #3
# from pypi package nvidia-cuda-nvcc, nvidia-cuda-nvcc-cu12, etc.
import importlib.util
if importlib.util.find_spec("nvidia") is not None:
for submodule in ["cu13", "cu12", "cu11", "cuda_nvcc"]:
spec = importlib.util.find_spec(f"nvidia.{submodule}")
if spec is not None and spec.submodule_search_locations:
cuda_home = os.path.join(spec.submodule_search_locations[0])
break
if cuda_home is None:
# Guess #3
# from pypi package nvidia-cuda-nvcc, nvidia-cuda-nvcc-cu12, etc.
import importlib.util
if importlib.util.find_spec("nvidia") is not None:
for submodule in ["cu13", "cu12", "cu11"]:
spec = importlib.util.find_spec(f"nvidia.{submodule}")
if spec is not None and spec.submodule_search_locations:
candidate = spec.submodule_search_locations[0]
nvcc_path = os.path.join(candidate, "bin", "nvcc")
if sys.platform == "win32":
nvcc_path += ".exe"
if os.path.isfile(nvcc_path):
cuda_home = candidate
break
🤖 Prompt for AI Agents
In tilelang/env.py around lines 69 to 79, remove "cuda_nvcc" from the inspected
nvidia submodules (only "cu13","cu12","cu11" should be checked) and, after
discovering a candidate submodule path via spec.submodule_search_locations[0],
verify that the CUDA install contains a usable nvcc binary before assigning
cuda_home: construct the bin path using os.path.join(candidate_path, "bin",
"nvcc") and on Windows also consider "nvcc.exe", and only set cuda_home when
that file exists (use os.path.exists); otherwise continue searching or fall
through to other guesses.

@oraluben
Copy link
Collaborator

oraluben commented Dec 26, 2025

with this PR, tilelang can work with a plain torch installation (with nvidia-cuda-nvcc installed), without requiring cuda toolchain.

Would you mind to make this work (e.g. docker run -ti --rm --gpus all ubuntu and inside docker just install nvcc and torch via pip)? Currently I got following error in that scenario:

(venv) root@8025c5faee4e:/# python /t/examples/gemm/example_gemm.py 
/venv/lib/python3.12/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled.
We recommend installing via `pip install torch-c-dlpack-ext`
  warnings.warn(
/venv/lib/python3.12/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:174: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled.
We recommend installing via `pip install torch-c-dlpack-ext`
  warnings.warn(
2025-12-26 03:21:42  [TileLang:tilelang.jit.kernel:INFO]: TileLang begins to compile kernel `gemm` with `out_idx=[-1]`
Traceback (most recent call last):
  File "/t/examples/gemm/example_gemm.py", line 67, in <module>
    main()
  File "/t/examples/gemm/example_gemm.py", line 30, in main
    kernel = matmul(1024, 1024, 1024, 128, 128, 32)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/tilelang/jit/__init__.py", line 423, in __call__
    kernel = self.compile(*args, **kwargs, **tune_params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/tilelang/jit/__init__.py", line 355, in compile
    kernel_result = compile(
                    ^^^^^^^^
  File "/venv/lib/python3.12/site-packages/tilelang/jit/__init__.py", line 99, in compile
    return cached(
           ^^^^^^^
  File "/venv/lib/python3.12/site-packages/tilelang/cache/__init__.py", line 30, in cached
    return _kernel_cache_instance.cached(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/tilelang/cache/kernel_cache.py", line 236, in cached
    kernel = JITKernel(
             ^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/tilelang/jit/kernel.py", line 137, in __init__
    adapter = self._compile_and_create_adapter(func, out_idx)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/tilelang/jit/kernel.py", line 242, in _compile_and_create_adapter
    artifact = tilelang.lower(
               ^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/tilelang/engine/lower.py", line 275, in lower
    codegen_mod = device_codegen(device_mod, target) if enable_device_compile else device_codegen_without_compile(device_mod, target)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/tilelang/engine/lower.py", line 198, in device_codegen
    device_mod = tvm.ffi.get_global_func(global_func)(device_mod, target)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "python/tvm_ffi/cython/function.pxi", line 923, in tvm_ffi.core.Function.__call__
  File "<unknown>", line 0, in tvm::codegen::BuildTileLangCUDA(tvm::IRModule, tvm::Target)
  File "python/tvm_ffi/cython/function.pxi", line 1077, in tvm_ffi.core.tvm_ffi_callback
  File "/venv/lib/python3.12/site-packages/tilelang/engine/lower.py", line 114, in tilelang_callback_cuda_compile
    ptx = nvcc.compile_cuda(

  File "/venv/lib/python3.12/site-packages/tilelang/contrib/nvcc.py", line 77, in compile_cuda
    cmd = [get_nvcc_compiler()]

  File "/venv/lib/python3.12/site-packages/tilelang/contrib/nvcc.py", line 592, in get_nvcc_compiler
    return os.path.join(find_cuda_path(), "bin", "nvcc")

  File "/venv/lib/python3.12/site-packages/tilelang/contrib/nvcc.py", line 275, in find_cuda_path
    raise RuntimeError(

RuntimeError: Failed to automatically detect CUDA installation. Please set the CUDA_HOME environment variable manually (e.g., export CUDA_HOME=/usr/local/cuda).

Here's my workaround for autodetection failure:

diff --git a/examples/gemm/example_gemm.py b/examples/gemm/example_gemm.py
index dfa43112..c945d8eb 100644
--- a/examples/gemm/example_gemm.py
+++ b/examples/gemm/example_gemm.py
@@ -2,7 +2,7 @@ import tilelang
 import tilelang.language as T
 
 
[email protected](out_idx=[-1])
[email protected](out_idx=[-1], target='cuda')
 def matmul(M, N, K, block_M, block_N, block_K, dtype=T.float16, accum_dtype=T.float32):
     @T.prim_func
     def gemm(

@clouds56
Copy link
Contributor Author

clouds56 commented Dec 26, 2025

@oraluben which Dockerfile are you using?
You could manually install nvidia-cuda-nvcc in the Dockerfile, via pip install nvidia-cuda-nvcc nvidia-cuda-cccl or uv add nvidia-cuda-nvcc nvidia-cuda-cccl, or uv add "cuda-toolkit[nvcc,cccl]", or uv add tilelang --optional nvcc

@oraluben
Copy link
Collaborator

@oraluben which Dockerfile are you using? You could manually install nvidia-cuda-nvcc in the Dockerfile, via pip install nvidia-cuda-nvcc nvidia-cuda-cccl or uv add nvidia-cuda-nvcc nvidia-cuda-cccl, or uv add "cuda-toolkit[nvcc,cccl]", or uv add tilelang --optional nvcc

I ran into the error with nvidia-cuda-nvcc installed.

@clouds56
Copy link
Contributor Author

clouds56 commented Dec 28, 2025

Sorry I have trouble in setting up a docker with libcuda.so.1 to reproduce (either could not run docker, or doesn't have GPU), could you help run this in your docker

python -c "import nvidia.cu13; print('1: done')"
python -c "import tilelang; print('2:', repr(tilelang.env.CUDA_HOME))"
python -c "import os; print('3:', os.environ.get('CUDA_HOME', '<not present>'))"
python -c "import os; print('4:', os.environ.get('CUDA_PATH', '<not present>'))"

An idea might you have your CUDA_HOME accidentally set to empty string so it wouldn't pass if cuda_home is None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants