求助！Qwen2-VL-7B-Instruct 训练时报错：TypeError: Attention.forward() got an unexpected keyword argument 'attention_bias' #464

zhaicunqi · 2025-02-12T04:15:46Z

======== 环境 ===========
Package Version

aiobotocore 2.19.0
aiohappyeyeballs 2.4.6
aiohttp 3.11.12
aioitertools 0.12.0
aiosignal 1.3.2
annotated-types 0.7.0
apex 0.1
async-timeout 5.0.1
attrs 25.1.0
autocommand 2.2.2
backports.tarfile 1.2.0
botocore 1.36.3
braceexpand 0.1.7
certifi 2025.1.31
charset-normalizer 3.4.1
click 8.1.8
datasets 3.2.0
dill 0.3.8
einops 0.8.1
filelock 3.13.1
flash_attn 2.6.3
frozenlist 1.5.0
fsspec 2024.9.0
huggingface-hub 0.28.1
idna 3.10
importlib_metadata 8.6.1
inflect 7.3.1
jaraco.collections 5.1.0
jaraco.context 5.3.0
jaraco.functools 4.0.1
jaraco.text 3.12.1
Jinja2 3.1.4
jmespath 1.0.1
MarkupSafe 2.1.5
megatron-energon 5.1.0
more-itertools 10.3.0
mpmath 1.3.0
multidict 6.1.0
multiprocess 0.70.16
networkx 3.3
numpy 2.1.2
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
packaging 24.2
pandas 2.2.3
pillow 11.0.0
pip 25.0
platformdirs 4.2.2
propcache 0.2.1
pyarrow 19.0.0
pybind11 2.13.6
pydantic 2.10.6
pydantic_core 2.27.2
python-dateutil 2.9.0.post0
pytz 2025.1
PyYAML 6.0.2
regex 2024.11.6
requests 2.32.3
s3fs 2025.2.0
safetensors 0.5.2
setuptools 75.8.0
six 1.17.0
sympy 1.13.1
tokenizers 0.20.3
tomli 2.0.1
torch 2.5.1+cu124
torchaudio 2.5.1+cu124
torchvision 0.20.1+cu124
tqdm 4.67.1
transformer_engine 1.12.0
transformer_engine_cu12 1.12.0
transformer_engine_torch 1.12.0
transformers 4.46.3
triton 3.1.0
typeguard 4.3.0
typing_extensions 4.12.2
tzdata 2025.1
urllib3 2.3.0
webdataset 0.2.110
wheel 0.45.1
wrapt 1.17.2
xxhash 3.5.0
yarl 1.18.3
zipp 3.21.0

========== 错误信息 ==========
training ...
[before the start of training step] datetime: 2025-02-12 12:12:34
NCCL version 2.21.5+cuda12.4
[rank2]: Traceback (most recent call last):
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/examples/qwen2_vl/pretrain_qwen.py", line 574, in
[rank2]: pretrain(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/training/training.py", line 408, in pretrain
[rank2]: iteration, num_floating_point_operations_so_far = train(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/training/training.py", line 1493, in train
[rank2]: train_step(forward_step_func,
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/training/training.py", line 791, in train_step
[rank2]: losses_reduced = forward_backward_func(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/pipeline_parallel/schedules.py", line 1741, in forward_backward_pipelining_without_interleaving
[rank2]: output_tensor, num_tokens = forward_step(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/pipeline_parallel/schedules.py", line 275, in forward_step
[rank2]: output_tensor, loss_func = forward_step_func(data_iterator, model)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/examples/qwen2_vl/pretrain_qwen.py", line 431, in forward_step
[rank2]: output_tensor = model(
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/distributed/data_parallel_base.py", line 22, in forward
[rank2]: return self.module(*inputs, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/legacy/model/module.py", line 189, in forward
[rank2]: outputs = self.module(*inputs, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/megatron_patch/model/qwen2_vl/model.py", line 212, in forward
[rank2]: vision_embeds = self.vision_model(
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/megatron_patch/model/qwen2_vl/visionmodel.py", line 204, in forward
[rank2]: hidden_states = self.decoder(
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/transformer/transformer_block.py", line 549, in forward
[rank2]: hidden_states, context = layer(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/transformer/transformer_layer.py", line 502, in call
[rank2]: return super(MegatronModule, self).call(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/transformer/transformer_layer.py", line 390, in forward
[rank2]: attention_output_with_bias = self.self_attention(
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: TypeError: Attention.forward() got an unexpected keyword argument 'attention_bias'

lostkevin · 2025-02-12T07:11:04Z

请使用对应版本Megatron-LM的子模块

git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

求助！Qwen2-VL-7B-Instruct 训练时报错：TypeError: Attention.forward() got an unexpected keyword argument 'attention_bias' #464

求助！Qwen2-VL-7B-Instruct 训练时报错：TypeError: Attention.forward() got an unexpected keyword argument 'attention_bias' #464

zhaicunqi commented Feb 12, 2025 •

edited

Loading

lostkevin commented Feb 12, 2025

求助！Qwen2-VL-7B-Instruct 训练时报错：TypeError: Attention.forward() got an unexpected keyword argument 'attention_bias' #464

求助！Qwen2-VL-7B-Instruct 训练时报错：TypeError: Attention.forward() got an unexpected keyword argument 'attention_bias' #464

Comments

zhaicunqi commented Feb 12, 2025 • edited Loading

lostkevin commented Feb 12, 2025

zhaicunqi commented Feb 12, 2025 •

edited

Loading