Remove duplicate kv_b_proj from models using MLA #1349

kwisniewski98 · 2025-06-02T12:38:46Z

Deepseek in our definition has two places where kv_b_proj is defined: in self_attn.kv_b_proj and self_attn.impl.kv_b_proj . First one isn't used, but at the model initialization is present, which makes inc try to quantize it. Because at the measurement it wasn't used, there are no measurements for this specific object and it causes it to crash.

Signed-off-by: kwisniewski98 <[email protected]>

xuechendi · 2025-06-02T14:21:44Z

Thanks, @kwisniewski98 , I was talking to @yiliu30 asking him to take a look of this issue which cause INC fp8 inference failing. Thanks for fixing.

xuechendi · 2025-06-02T14:25:17Z

From my side, I found the fp8 change also failed INC measurement:
Line 961 and 962, did you see same issue from your end?

xuechendi · 2025-06-02T16:52:40Z

Verified this PR with deepseek_r1-0528

QUANT_CONFIG=inc_quant_with_fp8kv_config.json \
VLLM_HPU_FORCE_CHANNEL_FP8=0 \
PT_HPU_LAZY_MODE=1 \
VLLM_DISABLE_MARK_SCALES_AS_CONST=1 \
VLLM_SKIP_WARMUP=true \
PT_HPU_ENABLE_LAZY_COLLECTIVES=true \
PT_HPU_WEIGHT_SHARING=0 \
lm_eval --model vllm \
  --model_args "pretrained=/mnt/weka/llm/DeepSeek-R1-0528,tensor_parallel_size=8,distributed_executor_backend=mp,trust_remote_code=true,max_model_len=4096,use_v2_block_manager=True,dtype=bfloat16,kv_cache_dtype=fp8_inc,enable_expert_parallel=True,max_num_seqs=256" \
  --tasks gsm8k --num_fewshot "5" \
  --batch_size "256" --limit 16 --log_samples --output_path fp8_gsm8k.json

{
    "mode": "QUANTIZE",
    "observer": "maxabs",
    "scale_method": "maxabs_hw",
    "scale_format": "const",
    "allowlist": {
        "types": [],
        "names": []
    },
    "blocklist": {
        "types": [],
        "names": [
            "lm_head",
            "mlp\\.gate\\b"
        ]
    },
    "dump_stats_path": "./nc_workspace_measure_kvache/inc_measure_output"
}

nc_workspace_measure_kvache.tar.gz

accuracy seems OK as well

Now with INC fp8

2025-06-03:06:37:57,901 INFO [lm_eval.loggers.evaluation_tracker:290] Saving per-sample results for: gsm8k
vllm (pretrained=/mnt/weka/llm/DeepSeek-R1-0528,tensor_parallel_size=8,distributed_executor_backend=mp,trust_remote_code=true,max_model_len=4096,use_v2_block_manager=True,dtype=bfloat16,kv_cache_dtype=fp8_inc,enable_expert_parallel=True,max_num_seqs=256), gen_kwargs: (None), limit: 256.0, num_fewshot: 5, batch_size: 256

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9609	±	0.0121
		strict-match	5	exact_match	↑	0.9570	±	0.0127

with out INC fp8
vllm (pretrained=/mnt/weka/data/pytorch/DeepSeek-R1/,tensor_parallel_size=8,distributed_executor_backend=mp,trust_remote_code=true,max_model_len=4096,use_v2_block_manager=True,dtype=bfloat16,enable_expert_parallel=True,max_num_seqs=128), gen_kwargs: (None), limit: 256.0, num_fewshot: 5, batch_size: 128

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9688	±	0.0109
		strict-match	5	exact_match	↑	0.9688	±	0.0109

xuechendi · 2025-06-02T18:36:05Z

/run-gaudi-tests

xuechendi · 2025-06-02T18:41:39Z

/run-gaudi-tests

I think the accuracy drop is significant, request INC team to double confirm

vllm/worker/hpu_model_runner.py

Signed-off-by: kwisniewski98 <[email protected]>

kwisniewski98 · 2025-06-03T11:14:33Z

From my side, I found the fp8 change also failed INC measurement: Line 961 and 962, did you see same issue from your end?

This will be addressed by PR to inc nr 242

vllm/worker/hpu_model_runner.py

Signed-off-by: kwisniewski98 <[email protected]>

xuechendi · 2025-06-03T13:59:01Z

/run-gaudi-tests

solved

Remove duplicate kv_b_proj from models using MLA

00b9384

Signed-off-by: kwisniewski98 <[email protected]>

kwisniewski98 requested review from kzawora-intel, madamczyk-intel, michalkuligowski, mgawarkiewicz-intel, vivekgoe, afierka-intel, xuechendi, jikunshang and mswiniarsk as code owners June 2, 2025 12:38

xuechendi previously approved these changes Jun 2, 2025

View reviewed changes

yiliu30 reviewed Jun 3, 2025

View reviewed changes

vllm/worker/hpu_model_runner.py Outdated Show resolved Hide resolved

Add check for model attribute

5dc655f

Signed-off-by: kwisniewski98 <[email protected]>

michalkuligowski previously requested changes Jun 3, 2025

View reviewed changes

vllm/worker/hpu_model_runner.py Outdated Show resolved Hide resolved

Change way of getting model in _remove_duplicate_submodules

ac7f94c

Signed-off-by: kwisniewski98 <[email protected]>

michalkuligowski approved these changes Jun 4, 2025

View reviewed changes

michalkuligowski merged commit 066244f into habana_main Jun 4, 2025
49 checks passed

michalkuligowski deleted the dev/kwisniewski/remove_deepseek_duplicates branch June 4, 2025 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove duplicate kv_b_proj from models using MLA #1349

Remove duplicate kv_b_proj from models using MLA #1349

Uh oh!

kwisniewski98 commented Jun 2, 2025

Uh oh!

xuechendi commented Jun 2, 2025 •

edited

Loading

Uh oh!

xuechendi commented Jun 2, 2025

Uh oh!

xuechendi commented Jun 2, 2025 •

edited

Loading

Uh oh!

xuechendi commented Jun 2, 2025

Uh oh!

xuechendi commented Jun 2, 2025

Uh oh!

Uh oh!

kwisniewski98 commented Jun 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

xuechendi commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Remove duplicate kv_b_proj from models using MLA #1349

Remove duplicate kv_b_proj from models using MLA #1349

Uh oh!

Conversation

kwisniewski98 commented Jun 2, 2025

Uh oh!

xuechendi commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuechendi commented Jun 2, 2025

Uh oh!

xuechendi commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuechendi commented Jun 2, 2025

Uh oh!

xuechendi commented Jun 2, 2025

Uh oh!

Uh oh!

kwisniewski98 commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

xuechendi commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

xuechendi commented Jun 2, 2025 •

edited

Loading

xuechendi commented Jun 2, 2025 •

edited

Loading

kwisniewski98 commented Jun 3, 2025 •

edited

Loading