Skip to content

Remove duplicate kv_b_proj from models using MLA #1349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

kwisniewski98
Copy link

Deepseek in our definition has two places where kv_b_proj is defined: in self_attn.kv_b_proj and self_attn.impl.kv_b_proj . First one isn't used, but at the model initialization is present, which makes inc try to quantize it. Because at the measurement it wasn't used, there are no measurements for this specific object and it causes it to crash.

@xuechendi
Copy link

xuechendi commented Jun 2, 2025

Thanks, @kwisniewski98 , I was talking to @yiliu30 asking him to take a look of this issue which cause INC fp8 inference failing. Thanks for fixing.

@xuechendi
Copy link

From my side, I found the fp8 change also failed INC measurement:
Line 961 and 962, did you see same issue from your end?
image

@xuechendi
Copy link

xuechendi commented Jun 2, 2025

Verified this PR with deepseek_r1-0528

QUANT_CONFIG=inc_quant_with_fp8kv_config.json \
VLLM_HPU_FORCE_CHANNEL_FP8=0 \
PT_HPU_LAZY_MODE=1 \
VLLM_DISABLE_MARK_SCALES_AS_CONST=1 \
VLLM_SKIP_WARMUP=true \
PT_HPU_ENABLE_LAZY_COLLECTIVES=true \
PT_HPU_WEIGHT_SHARING=0 \
lm_eval --model vllm \
  --model_args "pretrained=/mnt/weka/llm/DeepSeek-R1-0528,tensor_parallel_size=8,distributed_executor_backend=mp,trust_remote_code=true,max_model_len=4096,use_v2_block_manager=True,dtype=bfloat16,kv_cache_dtype=fp8_inc,enable_expert_parallel=True,max_num_seqs=256" \
  --tasks gsm8k --num_fewshot "5" \
  --batch_size "256" --limit 16 --log_samples --output_path fp8_gsm8k.json
{
    "mode": "QUANTIZE",
    "observer": "maxabs",
    "scale_method": "maxabs_hw",
    "scale_format": "const",
    "allowlist": {
        "types": [],
        "names": []
    },
    "blocklist": {
        "types": [],
        "names": [
            "lm_head",
            "mlp\\.gate\\b"
        ]
    },
    "dump_stats_path": "./nc_workspace_measure_kvache/inc_measure_output"
}

nc_workspace_measure_kvache.tar.gz

accuracy seems OK as well

Now with INC fp8

2025-06-03:06:37:57,901 INFO [lm_eval.loggers.evaluation_tracker:290] Saving per-sample results for: gsm8k
vllm (pretrained=/mnt/weka/llm/DeepSeek-R1-0528,tensor_parallel_size=8,distributed_executor_backend=mp,trust_remote_code=true,max_model_len=4096,use_v2_block_manager=True,dtype=bfloat16,kv_cache_dtype=fp8_inc,enable_expert_parallel=True,max_num_seqs=256), gen_kwargs: (None), limit: 256.0, num_fewshot: 5, batch_size: 256

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9609 ± 0.0121
strict-match 5 exact_match 0.9570 ± 0.0127

with out INC fp8
vllm (pretrained=/mnt/weka/data/pytorch/DeepSeek-R1/,tensor_parallel_size=8,distributed_executor_backend=mp,trust_remote_code=true,max_model_len=4096,use_v2_block_manager=True,dtype=bfloat16,enable_expert_parallel=True,max_num_seqs=128), gen_kwargs: (None), limit: 256.0, num_fewshot: 5, batch_size: 128

Tasks Version Filter n-shot Metric   Value   Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9688 ± 0.0109
    strict-match 5 exact_match 0.9688 ± 0.0109

@xuechendi
Copy link

/run-gaudi-tests

xuechendi
xuechendi previously approved these changes Jun 2, 2025
@xuechendi
Copy link

/run-gaudi-tests

@xuechendi xuechendi dismissed their stale review June 2, 2025 18:56

I think the accuracy drop is significant, request INC team to double confirm

Signed-off-by: kwisniewski98 <[email protected]>
@kwisniewski98
Copy link
Author

kwisniewski98 commented Jun 3, 2025

From my side, I found the fp8 change also failed INC measurement: Line 961 and 962, did you see same issue from your end? image

This will be addressed by PR to inc nr 242

@xuechendi
Copy link

/run-gaudi-tests

@michalkuligowski michalkuligowski merged commit 066244f into habana_main Jun 4, 2025
49 checks passed
@michalkuligowski michalkuligowski deleted the dev/kwisniewski/remove_deepseek_duplicates branch June 4, 2025 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants