-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix deepseekv3 grouped topk error #13474
[Bugfix] Fix deepseekv3 grouped topk error #13474
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
76896ef
to
d390d1b
Compare
Thank you for your contribution! I ran a quick eval on gsm8k with DeepSeek R1 and see that this PR seems to slightly regress performance, however it is within stderr.
I will try to run harder evals to determine improvement. Would you have an example of bad performance from DeepSeek on main due to this issue? |
@mgoin, we might need to test V3 model as it is more trained for GSM and MMLU. |
Here is MMLU for R1, similar small drops for each category - I'll try to get V3 going on another machine
|
I'll release v0.7.3 after this PR is merged. |
Head branch was pushed to by a user without write access
1dbfe7b
to
cb53786
Compare
Head branch was pushed to by a user without write access
cb53786
to
54dfc7e
Compare
@simon-mo Some checks in the CI pipeline have failed. Would you kindly assist with merging the code? |
Signed-off-by: Chen-XiaoBing <[email protected]>
Signed-off-by: Chen-XiaoBing <[email protected]>
Signed-off-by: Chen-XiaoBing <[email protected]>
54dfc7e
to
7676a35
Compare
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Chen-XiaoBing <[email protected]>
Signed-off-by: Chen-XiaoBing <[email protected]>
Fix the logic in grouped topk computation of fused moe.
There is a slight difference between vllm grouped_topk and the official code. When the newly-introduced bias term (e_score_correction_bias in vllm) is not None, we should firstly get the top-2 scores of each group, and use the summation to get top-k groups. You can also check the compute logic in DeepSeek v3's official inference code
The mask scores are set to 0. This configuration may result in the selection of experts in masked groups if the scores in the unmasked groups are negative. This behavior can lead to incorrect or suboptimal selections in certain scenarios. DeepSeek v3 also reset masked scores to -inf.