fix pretrain gpt and llama bug #476

9LLPPLL6 · 2025-05-08T12:21:56Z

Fixed some issues when running gpt and llama pre-training

GuanhuaWang · 2025-05-12T23:44:39Z

megatron/model/transformer.py

@@ -1654,7 +1654,7 @@ def get_num_experts_per_layer(num_experts: list, num_layers: int, expert_interva
        num_experts = num_experts * (num_layers // expert_interval)
    experts_per_layer = []
    for i in range(num_layers):
-        layer_num = i + 1 + offset


Hi, why do we need to delete this offset ?

This would cause num_experts to go out of bounds
See this issue for details: issue

fix pretrain gpt and llama bug

8f494db

9LLPPLL6 requested review from jeffra, tjruwase and GuanhuaWang as code owners May 8, 2025 12:21

GuanhuaWang requested changes May 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix pretrain gpt and llama bug #476

fix pretrain gpt and llama bug #476

Uh oh!

9LLPPLL6 commented May 8, 2025

Uh oh!

GuanhuaWang May 12, 2025

Uh oh!

9LLPPLL6 May 13, 2025

Uh oh!

Uh oh!

fix pretrain gpt and llama bug #476

Are you sure you want to change the base?

fix pretrain gpt and llama bug #476

Uh oh!

Conversation

9LLPPLL6 commented May 8, 2025

Uh oh!

GuanhuaWang May 12, 2025

Choose a reason for hiding this comment

Uh oh!

9LLPPLL6 May 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!