Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: MLA实现没有带来任何收益 #101

Open
foamliu opened this issue Jan 1, 2025 · 0 comments
Open

[Bug]: MLA实现没有带来任何收益 #101

foamliu opened this issue Jan 1, 2025 · 0 comments

Comments

@foamliu
Copy link

foamliu commented Jan 1, 2025

MLA(multi head latent attention)的实现本来是为着提升推理速度,但由于存入缓存的数据比基线(Llama)更大,因此不但未带来任何收益,而且与基线(Llama)相比,占用显存更多,推理更慢。

下面是 DeepSeekV3 HF官网的MLA实现,可见存入KVCache的数据量,比基线(Llama)还大:
cba6bdda9920aacfab1acc96e21652a

下面是推理测速的结果:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant