Improve turbomind's prefix cache #3332

lvhan028 · 2025-03-25T09:33:21Z

This Pull Request implements several improvements to the tm engine:

cache prompt tokens and generated tokens when prefix caching is enabled
remove stateful inference
update the chat API and CLI, get_logits and get_ppl accordingly
enable_prefix_caching defaults to True
merge pt's chat.py and tm's chat.py into one

Remaining refactoring will be processed in another PR:

remove get_prompt in model.py
deprecate the built-in chat template, and AutoTokenizer's apply_chat_template instead

test cases:

llm model evaluation
https://aicarrier.feishu.cn/wiki/JxoDwiOO7i0xnvkZHzxc3PdGnJh?sheet=95YWnW

lvhan028 added 5 commits March 17, 2025 16:16

add log

750aaa8

Merge branch 'main' into improve-tm-prefix-cache

8886124

refactor tm prefix caching

7b4304a

refactor tm prefix cache

8be44f8

Merge branch 'dev' into improve-tm-prefix-cache

dfdde01

lvhan028 added the improvement label Mar 25, 2025

lvhan028 added 2 commits March 25, 2025 18:10

fix linting

fda1e25

fix linting

a4ffe41

lvhan028 changed the base branch from main to dev March 25, 2025 11:13

lvhan028 added 13 commits March 27, 2025 16:47

combine Get&Create

acf4092

update

a2352d1

clear blocks

1e940df

INFO log to DEBUG log

533941d

refactor chat.py

91d1412

unlock the unmatched blocks when id is reused

ce08974

merge main

3891782

remove start_flag and end_flag from tm csrc

9c3ebc8

update output_logits

d41683a

update

70399b4

update

1b99728

fix api_client

c5a2962

remove interactive chat API

499b709

lvhan028 changed the base branch from dev to main April 3, 2025 01:54

lvhan028 added 7 commits April 3, 2025 11:23

fix build error on windows platform

617d317

fix chat

50e56e2

update generate.ps1

38ea2ae

fix clang-format error

e1489a5

fix clang-format error

9d1df28

fix vlm chat error

e2a0c7a

merge main

604b101

lvhan028 added 5 commits April 4, 2025 16:12

fix get_logits

5e34425

remove killing from tm csrc

1cbdf5a

fix clang-format

afd531d

update

3dc9ffa

enable_prefix_caching defaults to True

14eb22a

lvhan028 requested review from lzhangzz and irexyc April 8, 2025 04:33

lvhan028 added 2 commits April 8, 2025 13:22

merge pt chat.py and tm chat.py

7e13a18

remove pt chat.py and tm chat.py

22cf302

lvhan028 mentioned this pull request Apr 8, 2025

Default enable_prefix_caching True #3407

Closed

lvhan028 added 4 commits April 9, 2025 14:43

update

8531df8

Merge branch 'default-prefix-cache' into improve-tm-prefix-cache

3ddec13

fix

f3ef0d4

update

87dfbb9

lvhan028 added the BC-breaking label Apr 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve turbomind's prefix cache #3332

Improve turbomind's prefix cache #3332

lvhan028 commented Mar 25, 2025 •

edited

Loading

Improve turbomind's prefix cache #3332

Are you sure you want to change the base?

Improve turbomind's prefix cache #3332

Conversation

lvhan028 commented Mar 25, 2025 • edited Loading

lvhan028 commented Mar 25, 2025 •

edited

Loading