Deployment Regression: torch.compile Triton Compilation Failures on TensorRT-LLM with Python 3.12 / PyTorch 2.5+

Deployment Regression: torch.compile Triton Compilation Failures on TensorRT-LLM with Python 3.12 / PyTorch 2.5+
Issue Summary
Previously working TensorRT-LLM deployments with torch.compile now fail during startup due to Triton compiler errors in Baseten's updated deployment environment. The deployment environment appears to have been upgraded from Python 3.9 / PyTorch 2.3.x to Python 3.12 / PyTorch 2.5+, which introduces breaking changes for models using torch.compile on custom CUDA operations.

Environment
Previously Working (before ~January 2026):

Python: 3.9
PyTorch: ~2.3.x (inferred)
Deployment: Successful with torch.compile enabled
Current Environment (failing):

Python: 3.12 (confirmed in logs: /usr/local/briton/venv/lib/python3.12/)
PyTorch: 2.5.x / 2.7.x (based on torch==2.7.0 requirement behavior)
Deployment: Fails during torch.compile warmup
Expected Behavior
The model should deploy successfully with torch.compile enabled, as it did previously. Background compilation (via daemon thread) should complete without errors.

Actual Behavior
Deployment fails with Triton compiler InductorError during the compilation warmup phase:

torch._inductor.exc.InductorError: SubprocException: An exception occurred in a subprocess:
Traceback (most recent call last):
  File "/usr/local/briton/venv/lib/python3.12/site-packages/torch/_inductor/compile_worker/subproc_pool.py", line 337, in do_job
    result = job()
             ^^^^^
  File "/usr/local/briton/venv/lib/python3.12/site-packages/torch/_inductor/runtime/compile_tasks.py", line 61, in _worker_compile_triton
    kernel.precompile(warm_cache_only=True)
  File "/usr/local/briton/venv/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 267, in precompile
This occurs even when compilation is run in a background daemon thread, causing errors during model runtime.

Steps to Reproduce
Create a TensorRT-LLM Truss deployment with custom CUDA operations (e.g., SNAC audio decoder)
Enable torch.compile with dynamic batching:
decoder = torch.compile(model.decoder, dynamic=True)
Perform warmup compilation across multiple batch sizes
Deploy to Baseten with default Python/PyTorch versions
Observe Triton compilation failure during startup
Configuration
config.yaml:

python_version: py39  # Ignored? Deployed with Python 3.12
requirements:
  - torch==2.7.0  # Gets 2.5+ with incompatible Triton
  - transformers>=4.50.0
  - huggingface-hub>=1.3.0
model.py snippet:

class SnacModelBatched:
    def __init__(self):
        self.dtype_decoder = torch.float32
        compile_background = True  # Daemon thread
        use_compile = True
        model = SNAC.from_pretrained("/app/snac_24khz").eval()
        model = model.to("cuda")
        
        if use_compile:
            threading.Thread(target=self.compile, daemon=True).start()
    
    def compile(self):
        decoder = torch.compile(model.decoder, dynamic=True)
        # Warmup with various batch sizes
        for bs_size in range(1, 64):
            # ... compilation warmup ...
Additional Context
Secondary Issues Encountered:

pynvml deprecation warning (cosmetic, but indicates environment changes)
huggingface-hub version conflict (transformers requires <1.0, but environment has 1.3.4)
Workaround
Downgrade to PyTorch 2.4.0 explicitly:

python_version: py39
requirements:
  - torch==2.4.0  # Stable Triton compiler
  - transformers>=4.50.0
  - huggingface-hub>=1.3.0
This restores previous behavior and allows torch.compile to work correctly.

Impact
This is a breaking change for production deployments. Models that previously deployed successfully now fail, requiring code changes or version pinning to restore functionality. There was no deprecation warning or migration guide for this environment change.

Suggested Fix
Document environment versions: Clearly document base image Python/PyTorch versions and update schedules
Version stability: Allow explicit Python version control (current python_version: py39 appears ignored)
Graceful degradation: Catch Triton compilation errors and fallback to uncompiled mode with warnings
Pin dependencies: Default to stable PyTorch versions (e.g., 2.4.x) rather than bleeding-edge releases
Environment Details
Truss version: Latest (as of January 29, 2026)
Model type: TensorRT-LLM WebSocket endpoint
GPU: H100 40GB
Deployment status: Previously working, now broken


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment Regression: torch.compile Triton Compilation Failures on TensorRT-LLM with Python 3.12 / PyTorch 2.5+ #2168

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deployment Regression: torch.compile Triton Compilation Failures on TensorRT-LLM with Python 3.12 / PyTorch 2.5+ #2168

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions