We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
======== 环境 =========== Package Version
aiobotocore 2.19.0 aiohappyeyeballs 2.4.6 aiohttp 3.11.12 aioitertools 0.12.0 aiosignal 1.3.2 annotated-types 0.7.0 apex 0.1 async-timeout 5.0.1 attrs 25.1.0 autocommand 2.2.2 backports.tarfile 1.2.0 botocore 1.36.3 braceexpand 0.1.7 certifi 2025.1.31 charset-normalizer 3.4.1 click 8.1.8 datasets 3.2.0 dill 0.3.8 einops 0.8.1 filelock 3.13.1 flash_attn 2.6.3 frozenlist 1.5.0 fsspec 2024.9.0 huggingface-hub 0.28.1 idna 3.10 importlib_metadata 8.6.1 inflect 7.3.1 jaraco.collections 5.1.0 jaraco.context 5.3.0 jaraco.functools 4.0.1 jaraco.text 3.12.1 Jinja2 3.1.4 jmespath 1.0.1 MarkupSafe 2.1.5 megatron-energon 5.1.0 more-itertools 10.3.0 mpmath 1.3.0 multidict 6.1.0 multiprocess 0.70.16 networkx 3.3 numpy 2.1.2 nvidia-cublas-cu12 12.4.5.8 nvidia-cuda-cupti-cu12 12.4.127 nvidia-cuda-nvrtc-cu12 12.4.127 nvidia-cuda-runtime-cu12 12.4.127 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.2.1.3 nvidia-curand-cu12 10.3.5.147 nvidia-cusolver-cu12 11.6.1.9 nvidia-cusparse-cu12 12.3.1.170 nvidia-nccl-cu12 2.21.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.4.127 packaging 24.2 pandas 2.2.3 pillow 11.0.0 pip 25.0 platformdirs 4.2.2 propcache 0.2.1 pyarrow 19.0.0 pybind11 2.13.6 pydantic 2.10.6 pydantic_core 2.27.2 python-dateutil 2.9.0.post0 pytz 2025.1 PyYAML 6.0.2 regex 2024.11.6 requests 2.32.3 s3fs 2025.2.0 safetensors 0.5.2 setuptools 75.8.0 six 1.17.0 sympy 1.13.1 tokenizers 0.20.3 tomli 2.0.1 torch 2.5.1+cu124 torchaudio 2.5.1+cu124 torchvision 0.20.1+cu124 tqdm 4.67.1 transformer_engine 1.12.0 transformer_engine_cu12 1.12.0 transformer_engine_torch 1.12.0 transformers 4.46.3 triton 3.1.0 typeguard 4.3.0 typing_extensions 4.12.2 tzdata 2025.1 urllib3 2.3.0 webdataset 0.2.110 wheel 0.45.1 wrapt 1.17.2 xxhash 3.5.0 yarl 1.18.3 zipp 3.21.0
========== 错误信息 ========== training ... [before the start of training step] datetime: 2025-02-12 12:12:34 NCCL version 2.21.5+cuda12.4 [rank2]: Traceback (most recent call last): [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/examples/qwen2_vl/pretrain_qwen.py", line 574, in [rank2]: pretrain( [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/training/training.py", line 408, in pretrain [rank2]: iteration, num_floating_point_operations_so_far = train( [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/training/training.py", line 1493, in train [rank2]: train_step(forward_step_func, [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/training/training.py", line 791, in train_step [rank2]: losses_reduced = forward_backward_func( [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/pipeline_parallel/schedules.py", line 1741, in forward_backward_pipelining_without_interleaving [rank2]: output_tensor, num_tokens = forward_step( [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/pipeline_parallel/schedules.py", line 275, in forward_step [rank2]: output_tensor, loss_func = forward_step_func(data_iterator, model) [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/examples/qwen2_vl/pretrain_qwen.py", line 431, in forward_step [rank2]: output_tensor = model( [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank2]: return self._call_impl(*args, **kwargs) [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank2]: return forward_call(*args, **kwargs) [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/distributed/data_parallel_base.py", line 22, in forward [rank2]: return self.module(*inputs, **kwargs) [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank2]: return self._call_impl(*args, **kwargs) [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank2]: return forward_call(*args, **kwargs) [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/legacy/model/module.py", line 189, in forward [rank2]: outputs = self.module(*inputs, **kwargs) [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank2]: return self._call_impl(*args, **kwargs) [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank2]: return forward_call(*args, **kwargs) [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/megatron_patch/model/qwen2_vl/model.py", line 212, in forward [rank2]: vision_embeds = self.vision_model( [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank2]: return self._call_impl(*args, **kwargs) [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank2]: return forward_call(*args, **kwargs) [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/megatron_patch/model/qwen2_vl/visionmodel.py", line 204, in forward [rank2]: hidden_states = self.decoder( [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank2]: return self._call_impl(*args, **kwargs) [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank2]: return forward_call(*args, **kwargs) [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/transformer/transformer_block.py", line 549, in forward [rank2]: hidden_states, context = layer( [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/transformer/transformer_layer.py", line 502, in call [rank2]: return super(MegatronModule, self).call(*args, **kwargs) [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank2]: return self._call_impl(*args, **kwargs) [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank2]: return forward_call(*args, **kwargs) [rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/transformer/transformer_layer.py", line 390, in forward [rank2]: attention_output_with_bias = self.self_attention( [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank2]: return self._call_impl(*args, **kwargs) [rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank2]: return forward_call(*args, **kwargs) [rank2]: TypeError: Attention.forward() got an unexpected keyword argument 'attention_bias'
The text was updated successfully, but these errors were encountered:
请使用对应版本Megatron-LM的子模块
git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git
Sorry, something went wrong.
No branches or pull requests
======== 环境 ===========
Package Version
aiobotocore 2.19.0
aiohappyeyeballs 2.4.6
aiohttp 3.11.12
aioitertools 0.12.0
aiosignal 1.3.2
annotated-types 0.7.0
apex 0.1
async-timeout 5.0.1
attrs 25.1.0
autocommand 2.2.2
backports.tarfile 1.2.0
botocore 1.36.3
braceexpand 0.1.7
certifi 2025.1.31
charset-normalizer 3.4.1
click 8.1.8
datasets 3.2.0
dill 0.3.8
einops 0.8.1
filelock 3.13.1
flash_attn 2.6.3
frozenlist 1.5.0
fsspec 2024.9.0
huggingface-hub 0.28.1
idna 3.10
importlib_metadata 8.6.1
inflect 7.3.1
jaraco.collections 5.1.0
jaraco.context 5.3.0
jaraco.functools 4.0.1
jaraco.text 3.12.1
Jinja2 3.1.4
jmespath 1.0.1
MarkupSafe 2.1.5
megatron-energon 5.1.0
more-itertools 10.3.0
mpmath 1.3.0
multidict 6.1.0
multiprocess 0.70.16
networkx 3.3
numpy 2.1.2
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
packaging 24.2
pandas 2.2.3
pillow 11.0.0
pip 25.0
platformdirs 4.2.2
propcache 0.2.1
pyarrow 19.0.0
pybind11 2.13.6
pydantic 2.10.6
pydantic_core 2.27.2
python-dateutil 2.9.0.post0
pytz 2025.1
PyYAML 6.0.2
regex 2024.11.6
requests 2.32.3
s3fs 2025.2.0
safetensors 0.5.2
setuptools 75.8.0
six 1.17.0
sympy 1.13.1
tokenizers 0.20.3
tomli 2.0.1
torch 2.5.1+cu124
torchaudio 2.5.1+cu124
torchvision 0.20.1+cu124
tqdm 4.67.1
transformer_engine 1.12.0
transformer_engine_cu12 1.12.0
transformer_engine_torch 1.12.0
transformers 4.46.3
triton 3.1.0
typeguard 4.3.0
typing_extensions 4.12.2
tzdata 2025.1
urllib3 2.3.0
webdataset 0.2.110
wheel 0.45.1
wrapt 1.17.2
xxhash 3.5.0
yarl 1.18.3
zipp 3.21.0
========== 错误信息 ==========
training ...
[before the start of training step] datetime: 2025-02-12 12:12:34
NCCL version 2.21.5+cuda12.4
[rank2]: Traceback (most recent call last):
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/examples/qwen2_vl/pretrain_qwen.py", line 574, in
[rank2]: pretrain(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/training/training.py", line 408, in pretrain
[rank2]: iteration, num_floating_point_operations_so_far = train(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/training/training.py", line 1493, in train
[rank2]: train_step(forward_step_func,
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/training/training.py", line 791, in train_step
[rank2]: losses_reduced = forward_backward_func(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/pipeline_parallel/schedules.py", line 1741, in forward_backward_pipelining_without_interleaving
[rank2]: output_tensor, num_tokens = forward_step(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/pipeline_parallel/schedules.py", line 275, in forward_step
[rank2]: output_tensor, loss_func = forward_step_func(data_iterator, model)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/examples/qwen2_vl/pretrain_qwen.py", line 431, in forward_step
[rank2]: output_tensor = model(
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/distributed/data_parallel_base.py", line 22, in forward
[rank2]: return self.module(*inputs, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/legacy/model/module.py", line 189, in forward
[rank2]: outputs = self.module(*inputs, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/megatron_patch/model/qwen2_vl/model.py", line 212, in forward
[rank2]: vision_embeds = self.vision_model(
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/megatron_patch/model/qwen2_vl/visionmodel.py", line 204, in forward
[rank2]: hidden_states = self.decoder(
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/transformer/transformer_block.py", line 549, in forward
[rank2]: hidden_states, context = layer(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/transformer/transformer_layer.py", line 502, in call
[rank2]: return super(MegatronModule, self).call(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/transformer/transformer_layer.py", line 390, in forward
[rank2]: attention_output_with_bias = self.self_attention(
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: TypeError: Attention.forward() got an unexpected keyword argument 'attention_bias'
The text was updated successfully, but these errors were encountered: