Tensor-parallelize the DeepSeek V3 transformer layer #4062

wujingyue · 2025-03-12T06:20:34Z

cc @syed-ahmed

github-actions · 2025-03-12T06:21:16Z

Review updated until commit 98cadce

Description

Added multidevice test for DeepSeek V3 transformer layer
Parallelized transformer layer using Rowwise and Colwise parallelism
Moved test from test_deepseek_v3.py to multidevice/test_deepseek_v3.py

Changes walkthrough 📝

Relevant files

Enhancement

test_deepseek_v3.py `Add multidevice test for DeepSeek V3 transformer layer` tests/python/multidevice/test_deepseek_v3.py Added new test file for multidevice testing of DeepSeek V3 transformer layer Implemented setup_process_group fixture for initializing process group Added default_tensor_type context manager for setting default tensor type and device Implemented test_transformer_layer to test parallelized transformer layer	+143/-0

Other

test_deepseek_v3.py `Remove old test_transformer_layer` tests/python/test_deepseek_v3.py Removed old test_transformer_layer function	+0/-60

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review Timeout Risk The test timed out once when downloading the model configuration. This could be a transient issue, but it's worth investigating to ensure it doesn't happen consistently. # This test timed out once when downloading # "/deepseek-ai/DeepSeek-V3/resolve/main/configuration_deepseek.py" (cf. # http://nv/eCm). I consider this a one-off, but please let me know if this # error becomes consistent. Hardcoded Port The default port for the process group initialization is hardcoded. This could lead to conflicts if multiple tests are run simultaneously. Consider using a dynamic port assignment. backend="nccl", init_method="tcp://localhost:29500", world_size=communicator.size(), Device Mesh Initialization The device mesh is initialized with a hardcoded device type ("cuda"). This could be problematic if the test is run in an environment without CUDA support. Consider making the device type configurable. mesh = dist.device_mesh.init_device_mesh("cuda", [d])

wujingyue · 2025-03-13T03:53:30Z

!test

wujingyue · 2025-03-14T22:38:47Z

!test

kevinstephano

LGTM.

wujingyue · 2025-04-18T06:01:11Z

!test

wujingyue · 2025-04-18T23:21:27Z

!test

wujingyue · 2025-04-19T03:38:36Z

!test

wujingyue added 6 commits March 4, 2025 13:00

WIP

d4fe504

Redo

0a0fc34

Comment

c2fa1c9

Reduce the sequence length for GPU memory

a93b7c9

Model parallel transformer layer

2c4f7a0

Parallelize MoE

1927fd6

wujingyue changed the base branch from main to wjy/v3 March 12, 2025 06:21

wujingyue requested a review from syed-ahmed March 13, 2025 21:29

Base automatically changed from wjy/v3 to main March 14, 2025 16:04

Merge branch 'main' into wjy/parallel

7f9f337

wujingyue requested a review from kevinstephano April 11, 2025 20:36

kevinstephano approved these changes Apr 17, 2025

View reviewed changes

wujingyue added 3 commits April 17, 2025 22:22

Merge remote-tracking branch 'origin/main' into wjy/parallel

8e94a88

setup_process_group no longer depends on multidevice_test

c9e5a41

Merge branch 'main' into wjy/parallel

d06d1c0

wujingyue added 2 commits April 18, 2025 16:20

Fix

fd8b3aa

Lint

3641051

Move test_deepseek_v3.py to multidevice

98cadce

wujingyue merged commit c969903 into main Apr 19, 2025
28 of 29 checks passed

wujingyue deleted the wjy/parallel branch April 19, 2025 04:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor-parallelize the DeepSeek V3 transformer layer #4062

Tensor-parallelize the DeepSeek V3 transformer layer #4062

wujingyue commented Mar 12, 2025 •

edited

Loading

github-actions bot commented Mar 12, 2025 •

edited

Loading

wujingyue commented Mar 13, 2025

wujingyue commented Mar 14, 2025

kevinstephano left a comment

wujingyue commented Apr 18, 2025

wujingyue commented Apr 18, 2025

wujingyue commented Apr 19, 2025

Tensor-parallelize the DeepSeek V3 transformer layer #4062

Tensor-parallelize the DeepSeek V3 transformer layer #4062

Conversation

wujingyue commented Mar 12, 2025 • edited Loading

github-actions bot commented Mar 12, 2025 • edited Loading

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

wujingyue commented Mar 13, 2025

wujingyue commented Mar 14, 2025

kevinstephano left a comment

Choose a reason for hiding this comment

wujingyue commented Apr 18, 2025

wujingyue commented Apr 18, 2025

wujingyue commented Apr 19, 2025

wujingyue commented Mar 12, 2025 •

edited

Loading

github-actions bot commented Mar 12, 2025 •

edited

Loading