New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Bug]: MLA实现没有带来任何收益 #101

Open

foamliu opened this issue Jan 1, 2025 · 0 comments

foamliu commented Jan 1, 2025

MLA（multi head latent attention）的实现本来是为着提升推理速度，但由于存入缓存的数据比基线（Llama）更大，因此不但未带来任何收益，而且与基线（Llama）相比，占用显存更多，推理更慢。

下面是 DeepSeekV3 HF官网的MLA实现，可见存入KVCache的数据量，比基线（Llama）还大：

下面是推理测速的结果：

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment