enable chunked prefill with speculative decoding by thomasj02 · Pull Request #537 · basetenlabs/truss-examples

thomasj02 · 2026-01-29T00:35:29Z

Summary

Removes the restriction that disabled chunked prefill when speculative decoding was enabled
Increases max_seq_len cap from 32768 to 262144 for speculative decoding builds
The underlying TensorRT-LLM issue (TRT-LLM 20.0 Engine Flow regression - lookahead + chunked prefill + long context: cudaGetLastError(): invalid configuration argument NVIDIA/TensorRT-LLM#5451) has been resolved
Corresponds to PR #15990 in basetenlabs/baseten which removes the BRITON_ALLOW_SPEC_DEC_CHUNKED_PREFILL env var check

Changes

Removed line in generate_templates.py that set enable_chunked_context = False for speculative decoding
Changed max_seq_len cap from 32768 to 262144 for speculative decoding
Regenerated affected Briton configs

michaelfeil

approve penign changes

...anker-classification-tensorrt/Briton-google-gemma-3-27b-it-speculative-lookahead/config.yaml

TRT-LLM issue #5451 has been resolved, allowing chunked prefill to work alongside speculative decoding. Removes the restriction that disabled chunked prefill when speculative decoding was enabled. Also increases max_seq_len cap from 32768 to 262144 for speculative decoding builds.

michaelfeil approved these changes Jan 29, 2026

View reviewed changes

...anker-classification-tensorrt/Briton-google-gemma-3-27b-it-speculative-lookahead/config.yaml Outdated Show resolved Hide resolved

thomasj02 force-pushed the tjohnson/enable-chunked-prefill-with-speculative-decoding branch from a231566 to 5ee6de8 Compare January 29, 2026 00:51

thomasj02 merged commit 6e2ed20 into main Jan 29, 2026
1 check passed

thomasj02 deleted the tjohnson/enable-chunked-prefill-with-speculative-decoding branch January 29, 2026 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable chunked prefill with speculative decoding#537

enable chunked prefill with speculative decoding#537
thomasj02 merged 1 commit intomainfrom
tjohnson/enable-chunked-prefill-with-speculative-decoding

thomasj02 commented Jan 29, 2026 •

edited

Loading

Uh oh!

michaelfeil left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thomasj02 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

michaelfeil left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thomasj02 commented Jan 29, 2026 •

edited

Loading