Deepseek moe #2467

esmeetu · 2024-01-17T14:11:31Z

This is a refactor version based #2453, since i have no permission to change onto that PR.

Make load_weight logic cleaner, and remove unnecessary merged_replicated_linear_loader function.
Remove reduce_results args of DeepseekMLP
Make shared_expert and expert use same DeepseekExpertMLP which is semantically appropriate.
Align Mixtral MLP with DeepseekExpertMLP by using ReplicatedLinear across gate_proj, up_proj, down_proj.

@zwd003 Thanks for your contribution again! And happy to be co-auther with you.
@zhuohan123 @WoosukKwon This PR is ready for review! Also we can wait for #2293

zwd003 · 2024-01-18T04:43:17Z

vllm/model_executor/models/deepseek.py

+ else:
+ final_hidden_states.add_(current_hidden_states)
+
+ y = tensor_model_parallel_all_reduce(final_hidden_states)


reducing after forwarding sharedexpert could avoiding one extra reduce operation, as sharedexpert also requires a reduce operation; these two operations can be merged.

I didn't split shared_expert weights, and there is no need to reduce for that. Am i right?

oh, it's right

splitting shared_expert may have higher performance, reducing memory required by each gpu

Emm..., It seems make sense. But maybe we can wait for #2293. And then consider improving performance.

cadedaniel

thanks for the contribution @esmeetu and @zwd003 . @simon-mo asked me to review; I took an initial pass, will run on h100 and review later today.

cadedaniel · 2024-01-22T18:20:27Z

vllm/model_executor/models/deepseek.py

+ self.expert_indicies = np.array_split(range(
+ self.n_routed_experts), self.tp_size)[self.rank].tolist()


nit: we can replace with pure-pytorch version

torch.arange(self.n_routed_experts ).split(self.n_routed_experts // self.tp_size)[self.rank].tolist()

cadedaniel · 2024-01-22T18:20:49Z

vllm/model_executor/models/deepseek.py

@@ -0,0 +1,468 @@
+# coding=utf-8
+# Adapted from
+# https://github.com/huggingface/transformers/blob/v4.28.0/src/transformers/models/llama/modeling_llama.py


can we update this comment? seems adapted mostly from mixtral.py

pcmoritz · 2024-01-22T23:39:02Z

vllm/model_executor/models/deepseek.py

+ max_position_embeddings=max_position_embeddings,
+ linear_method=linear_method,
+ )
+ self.mlp = DeepseekMoE(config=config,


I'm pretty sure this would be easier to read if you did

if <...>: self.mlp = DeepseekMoE(...) else: self.mlp = DeepseekMLP(...)

:)

esmeetu · 2024-01-23T00:19:47Z

@cadedaniel @pcmoritz Thanks for your reviews! I suggest to merge the fused moe version in #2453 which is faster. And i will close this PR and let's pushing that PR being merged.

esmeetu and others added 5 commits January 13, 2024 00:24

init support

3a2e75e

fix residual and moe

8d7a0f5

deepseekmoe support

2458afd

refactor

3c35d6b

format

091970d

esmeetu mentioned this pull request Jan 17, 2024

DeepseekMoE support with Fused MoE kernel #2453

Merged

cadedaniel self-assigned this Jan 18, 2024

zwd003 reviewed Jan 18, 2024

View reviewed changes

simon-mo mentioned this pull request Jan 22, 2024

DeepSeek MoE support #2534

Closed

cadedaniel reviewed Jan 22, 2024

View reviewed changes

pcmoritz reviewed Jan 22, 2024

View reviewed changes

esmeetu closed this Jan 23, 2024

esmeetu deleted the deepseek-moe branch February 14, 2024 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepseek moe #2467

Deepseek moe #2467

esmeetu commented Jan 17, 2024 •

edited

Loading

zwd003 Jan 18, 2024

esmeetu Jan 18, 2024

zwd003 Jan 18, 2024

zwd003 Jan 18, 2024

esmeetu Jan 18, 2024

cadedaniel left a comment

cadedaniel Jan 22, 2024

cadedaniel Jan 22, 2024

pcmoritz Jan 22, 2024

esmeetu commented Jan 23, 2024 •

edited

Loading

		self.expert_indicies = np.array_split(range(
		self.n_routed_experts), self.tp_size)[self.rank].tolist()

Deepseek moe #2467

Deepseek moe #2467

Conversation

esmeetu commented Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cadedaniel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

esmeetu commented Jan 23, 2024 • edited Loading

esmeetu commented Jan 17, 2024 •

edited

Loading

esmeetu commented Jan 23, 2024 •

edited

Loading