Skip to content

enable chunked prefill with speculative decoding#537

Merged
thomasj02 merged 1 commit intomainfrom
tjohnson/enable-chunked-prefill-with-speculative-decoding
Jan 29, 2026
Merged

enable chunked prefill with speculative decoding#537
thomasj02 merged 1 commit intomainfrom
tjohnson/enable-chunked-prefill-with-speculative-decoding

Conversation

@thomasj02
Copy link
Contributor

@thomasj02 thomasj02 commented Jan 29, 2026

Summary

Changes

  • Removed line in generate_templates.py that set enable_chunked_context = False for speculative decoding
  • Changed max_seq_len cap from 32768 to 262144 for speculative decoding
  • Regenerated affected Briton configs

Copy link
Contributor

@michaelfeil michaelfeil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approve penign changes

TRT-LLM issue #5451 has been resolved, allowing chunked prefill to work
alongside speculative decoding. Removes the restriction that disabled
chunked prefill when speculative decoding was enabled.

Also increases max_seq_len cap from 32768 to 262144 for speculative
decoding builds.
@thomasj02 thomasj02 force-pushed the tjohnson/enable-chunked-prefill-with-speculative-decoding branch from a231566 to 5ee6de8 Compare January 29, 2026 00:51
@thomasj02 thomasj02 merged commit 6e2ed20 into main Jan 29, 2026
1 check passed
@thomasj02 thomasj02 deleted the tjohnson/enable-chunked-prefill-with-speculative-decoding branch January 29, 2026 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants