-
Notifications
You must be signed in to change notification settings - Fork 903
Description
Describe the bug
File "xxxx/Megatron-LM-core_v0.13.1/megatron/core/tensor_parallel/random.py", line 477, in checkpoint
[rank122]: return CheckpointFunction.apply(function, distribute_saved_activations, *args)
[rank122]: File "/usr/local/lib/python3.10/site-packages/torch/autograd/function.py", line 575, in apply
[rank122]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank122]: File "xxx/Megatron-LM-core_v0.13.1/megatron/core/tensor_parallel/random.py", line 423, in forward
[rank122]: outputs = run_function(*args)
[rank122]: File "xxx/swift_0916/swift_251015/swift/megatron/model/mm_gpt/qwen3_vl.py", line 217, in custom_forward
[rank122]: layer = self._get_layer(index)
[rank122]: File "xxx/Megatron-LM-core_v0.13.1/megatron/core/transformer/transformer_block.py", line 361, in _get_layer
[rank122]: return self.layers[layer_number]
[rank122]: File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 334, in getitem
[rank122]: return self._modules[self._get_abs_string_index(idx)]
[rank122]: File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 316, in _get_abs_string_index
[rank122]: raise IndexError(f"index {idx} is out of range")
[rank122]: IndexError: index 36 is out of range
从模型的config来看,num_hidden_layers为36,实际megatron会读取超过这个数字
