-
Notifications
You must be signed in to change notification settings - Fork 110
Remove duplicate kv_b_proj from models using MLA #1349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove duplicate kv_b_proj from models using MLA #1349
Conversation
Signed-off-by: kwisniewski98 <[email protected]>
Thanks, @kwisniewski98 , I was talking to @yiliu30 asking him to take a look of this issue which cause INC fp8 inference failing. Thanks for fixing. |
Verified this PR with deepseek_r1-0528
nc_workspace_measure_kvache.tar.gz accuracy seems OK as well Now with INC fp8 2025-06-03:06:37:57,901 INFO [lm_eval.loggers.evaluation_tracker:290] Saving per-sample results for: gsm8k
with out INC fp8
|
/run-gaudi-tests |
/run-gaudi-tests |
I think the accuracy drop is significant, request INC team to double confirm
Signed-off-by: kwisniewski98 <[email protected]>
Signed-off-by: kwisniewski98 <[email protected]>
/run-gaudi-tests |
Deepseek in our definition has two places where kv_b_proj is defined: in
self_attn.kv_b_proj
andself_attn.impl.kv_b_proj
. First one isn't used, but at the model initialization is present, which makes inc try to quantize it. Because at the measurement it wasn't used, there are no measurements for this specific object and it causes it to crash.