Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

求助!Qwen2-VL-7B-Instruct 训练时报错:TypeError: Attention.forward() got an unexpected keyword argument 'attention_bias' #464

Open
zhaicunqi opened this issue Feb 12, 2025 · 1 comment

Comments

@zhaicunqi
Copy link

zhaicunqi commented Feb 12, 2025

======== 环境 ===========
Package Version


aiobotocore 2.19.0
aiohappyeyeballs 2.4.6
aiohttp 3.11.12
aioitertools 0.12.0
aiosignal 1.3.2
annotated-types 0.7.0
apex 0.1
async-timeout 5.0.1
attrs 25.1.0
autocommand 2.2.2
backports.tarfile 1.2.0
botocore 1.36.3
braceexpand 0.1.7
certifi 2025.1.31
charset-normalizer 3.4.1
click 8.1.8
datasets 3.2.0
dill 0.3.8
einops 0.8.1
filelock 3.13.1
flash_attn 2.6.3
frozenlist 1.5.0
fsspec 2024.9.0
huggingface-hub 0.28.1
idna 3.10
importlib_metadata 8.6.1
inflect 7.3.1
jaraco.collections 5.1.0
jaraco.context 5.3.0
jaraco.functools 4.0.1
jaraco.text 3.12.1
Jinja2 3.1.4
jmespath 1.0.1
MarkupSafe 2.1.5
megatron-energon 5.1.0
more-itertools 10.3.0
mpmath 1.3.0
multidict 6.1.0
multiprocess 0.70.16
networkx 3.3
numpy 2.1.2
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
packaging 24.2
pandas 2.2.3
pillow 11.0.0
pip 25.0
platformdirs 4.2.2
propcache 0.2.1
pyarrow 19.0.0
pybind11 2.13.6
pydantic 2.10.6
pydantic_core 2.27.2
python-dateutil 2.9.0.post0
pytz 2025.1
PyYAML 6.0.2
regex 2024.11.6
requests 2.32.3
s3fs 2025.2.0
safetensors 0.5.2
setuptools 75.8.0
six 1.17.0
sympy 1.13.1
tokenizers 0.20.3
tomli 2.0.1
torch 2.5.1+cu124
torchaudio 2.5.1+cu124
torchvision 0.20.1+cu124
tqdm 4.67.1
transformer_engine 1.12.0
transformer_engine_cu12 1.12.0
transformer_engine_torch 1.12.0
transformers 4.46.3
triton 3.1.0
typeguard 4.3.0
typing_extensions 4.12.2
tzdata 2025.1
urllib3 2.3.0
webdataset 0.2.110
wheel 0.45.1
wrapt 1.17.2
xxhash 3.5.0
yarl 1.18.3
zipp 3.21.0

========== 错误信息 ==========
training ...
[before the start of training step] datetime: 2025-02-12 12:12:34
NCCL version 2.21.5+cuda12.4
[rank2]: Traceback (most recent call last):
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/examples/qwen2_vl/pretrain_qwen.py", line 574, in
[rank2]: pretrain(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/training/training.py", line 408, in pretrain
[rank2]: iteration, num_floating_point_operations_so_far = train(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/training/training.py", line 1493, in train
[rank2]: train_step(forward_step_func,
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/training/training.py", line 791, in train_step
[rank2]: losses_reduced = forward_backward_func(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/pipeline_parallel/schedules.py", line 1741, in forward_backward_pipelining_without_interleaving
[rank2]: output_tensor, num_tokens = forward_step(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/pipeline_parallel/schedules.py", line 275, in forward_step
[rank2]: output_tensor, loss_func = forward_step_func(data_iterator, model)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/examples/qwen2_vl/pretrain_qwen.py", line 431, in forward_step
[rank2]: output_tensor = model(
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/distributed/data_parallel_base.py", line 22, in forward
[rank2]: return self.module(*inputs, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/legacy/model/module.py", line 189, in forward
[rank2]: outputs = self.module(*inputs, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/megatron_patch/model/qwen2_vl/model.py", line 212, in forward
[rank2]: vision_embeds = self.vision_model(
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/megatron_patch/model/qwen2_vl/visionmodel.py", line 204, in forward
[rank2]: hidden_states = self.decoder(
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/transformer/transformer_block.py", line 549, in forward
[rank2]: hidden_states, context = layer(
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/transformer/transformer_layer.py", line 502, in call
[rank2]: return super(MegatronModule, self).call(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: File "/home/common-vl/workspaces/zhaicunqi/llm/code/Pai-Megatron-Patch/Megatron-LM-241113/megatron/core/transformer/transformer_layer.py", line 390, in forward
[rank2]: attention_output_with_bias = self.self_attention(
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank2]: return self._call_impl(*args, **kwargs)
[rank2]: File "/home/miniconda3/envs/pai-megatron-py310-torch2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank2]: return forward_call(*args, **kwargs)
[rank2]: TypeError: Attention.forward() got an unexpected keyword argument 'attention_bias'

@lostkevin
Copy link
Contributor

请使用对应版本Megatron-LM的子模块

git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants