Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The problem of splitting transformer layers when pipeline parallelism cannot be evenly divided. #1304

Open
Baibaifan opened this issue Nov 27, 2024 · 0 comments

Comments

@Baibaifan
Copy link

Baibaifan commented Nov 27, 2024

Describe the bug

Situation:

GPT2 Models

  • num-layers=30, pipeline-model-parallel-size=4
  • Don't use decoder-first-pipeline-num-layers and decoder-last-pipeline-num-layers

Segmentation results

stage1: 0,1,2,3,4,5,6
stage2: 7,8,9,10,11,12,13
stage3: 14,15,16,17,18,19,20
stage4: 21,22,23,24,25,26,27

sum layers: 28 layers not equal to 30 layers.

In the legacy version, there is a judgment on the number of model layers.
Image

In the Mcore version, only num-layers-per-virtual-pipeline-stage can be used to determine the number of model layers.
Image

I think if users are required to split the model layer themselves due to imbalance, judgment and necessary warnings should be added here.

Environment (please complete the following information):

  • Megatron-LM commit ID:Main branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant