Skip to content

Qwen2.5vl在swift3.10版本进行多模态微调时,报错。 #6153

@chengximeng67

Description

@chengximeng67

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
在更新swift版本从3.3到3.10后,原本正常的数据集和脚本再次运行报错如下,其他环境都没动

[INFO:swift] default_system: 'You are a helpful assistant.' [INFO:swift] max_length: 2048 [INFO:swift] response_prefix: '' [INFO:swift] agent_template: hermes [INFO:swift] norm_bbox: none [INFO:swift] Setting ROOT_IMAGE_DIR: None. You can adjust this hyperparameter through the environment variable: ROOT_IMAGE_DIR`.
[INFO:swift] Start time of running main: 2025-10-15 22:12:27.918088
[INFO:swift] swift.version: 3.10.0.dev0
Generating train split: 12023 examples [00:00, 79901.06 examples/s]
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12023/12023 [00:00<00:00, 33587.64 examples/s]
[INFO:swift] train_dataset: Dataset({
features: ['messages', 'images'],
num_rows: 12023
})
[INFO:swift] val_dataset: None
[INFO:swift] Traceback (most recent call last):
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/dataset/utils.py", line 97, in getitem
return self.encode_func(data, return_length=True)
File "/data2/anaconda3/envs/chxm_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/base.py", line 489, in encode
inputs = TemplateInputs.from_dict(inputs)
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 341, in from_dict
return cls(**kwargs)
File "", line 7, in init
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 272, in post_init
setattr(self, key, StdTemplateInputs.from_dict(value_dict))
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 187, in from_dict
messages = inputs['messages']
KeyError: 'messages'

[WARNING:swift] 👆👆👆There are errors in the template.encode, and another piece of data will be randomly selected.
[INFO:swift] Traceback (most recent call last):
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/dataset/utils.py", line 97, in getitem
return self.encode_func(data, return_length=True)
File "/data2/anaconda3/envs/chxm_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/base.py", line 489, in encode
inputs = TemplateInputs.from_dict(inputs)
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 341, in from_dict
return cls(**kwargs)
File "", line 7, in init
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 272, in post_init
setattr(self, key, StdTemplateInputs.from_dict(value_dict))
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 187, in from_dict
messages = inputs['messages']
KeyError: 'messages'

[WARNING:swift] 👆👆👆There are errors in the template.encode, and another piece of data will be randomly selected.
[INFO:swift] Traceback (most recent call last):
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/dataset/utils.py", line 97, in getitem
return self.encode_func(data, return_length=True)
File "/data2/anaconda3/envs/chxm_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/base.py", line 489, in encode
inputs = TemplateInputs.from_dict(inputs)
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 341, in from_dict
return cls(**kwargs)
File "", line 7, in init
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 272, in post_init
setattr(self, key, StdTemplateInputs.from_dict(value_dict))
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 187, in from_dict
messages = inputs['messages']
KeyError: 'messages'

[WARNING:swift] 👆👆👆There are errors in the template.encode, and another piece of data will be randomly selected.
[INFO:swift] Traceback (most recent call last):
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/dataset/utils.py", line 97, in getitem
return self.encode_func(data, return_length=True)
File "/data2/anaconda3/envs/chxm_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/base.py", line 489, in encode
inputs = TemplateInputs.from_dict(inputs)
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 341, in from_dict
return cls(**kwargs)
File "", line 7, in init
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 272, in post_init
setattr(self, key, StdTemplateInputs.from_dict(value_dict))
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 187, in from_dict
messages = inputs['messages']
KeyError: 'messages'

[WARNING:swift] 👆👆👆There are errors in the template.encode, and another piece of data will be randomly selected.
[INFO:swift] Traceback (most recent call last):
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/dataset/utils.py", line 97, in getitem
return self.encode_func(data, return_length=True)
File "/data2/anaconda3/envs/chxm_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/base.py", line 489, in encode
inputs = TemplateInputs.from_dict(inputs)
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 341, in from_dict
return cls(**kwargs)
File "", line 7, in init
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 272, in post_init
setattr(self, key, StdTemplateInputs.from_dict(value_dict))
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 187, in from_dict
messages = inputs['messages']
KeyError: 'messages'

[WARNING:swift] 👆👆👆There are errors in the template.encode, and another piece of data will be randomly selected.
[INFO:swift] Traceback (most recent call last):
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/dataset/utils.py", line 97, in getitem
return self.encode_func(data, return_length=True)
File "/data2/anaconda3/envs/chxm_2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/base.py", line 489, in encode
inputs = TemplateInputs.from_dict(inputs)
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 341, in from_dict
return cls(**kwargs)
File "", line 7, in init
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 272, in post_init
setattr(self, key, StdTemplateInputs.from_dict(value_dict))
File "/data2/chxm/Multimodal-REC/ms-swift/swift/llm/template/template_inputs.py", line 187, in from_dict
messages = inputs['messages']
KeyError: 'messages'`

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

L40 GPU
训练脚本如下
`#!/bin/bash

--- Configuration ---

nproc_per_node=4
CUDA_VISIBLE_DEVICES=0,1,2,3
MAX_PIXELS=564000
VRAM_THRESHOLD_MIB=102400

--- Waiting Loop ---

echo "Checking GPU 0 VRAM usage every 30 seconds. Waiting for it to be below ${VRAM_THRESHOLD_MIB} MiB (1GB)..."

while true; do
used_mib_str=$(nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits -i 0 2>/dev/null)

if [[ "$used_mib_str" =~ ^[0-9]+$ ]]; then
    used_mib="$used_mib_str"
    if [ "$used_mib" -lt "$VRAM_THRESHOLD_MIB" ]; then
        echo "GPU 0 VRAM usage is ${used_mib} MiB (< ${VRAM_THRESHOLD_MIB} MiB). Condition met. Starting training..."
        break
    else
        echo "GPU 0 VRAM usage is ${used_mib} MiB (>= ${VRAM_THRESHOLD_MIB} MiB). Waiting 30 seconds..."
        sleep 30
    fi
else
    echo "Could not get GPU 0 VRAM usage or output is not a number. Output: '${used_mib_str}'. Waiting 30 seconds and retrying..."
    sleep 30
fi

done

--- Training Command ---

echo "Executing training command..."
MAX_PIXELS=$MAX_PIXELS
CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES
NPROC_PER_NODE=$nproc_per_node
swift sft
--model /home/member/data1/MODEL_WEIGHTS_PUBLIC/Qwenvl2.5_3b
--model_type qwen2_5_vl
--gradient_checkpointing True
--train_type full
--dataset '/home/member/data2/chxm/Multimodal-REC/LLVIP_REF/LLVIP_Data_Augmentation/train_multimodal.json'
--torch_dtype bfloat16
--num_train_epochs 8
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--learning_rate 1e-6
--gradient_accumulation_steps 4
--eval_steps 100
--save_steps 100
--save_total_limit 200
--logging_steps 5
--max_length 2048
--output_dir /data1/chxm/MultiModal-REC/LLVIP_Data_Augmentation_seed_42_gpr
--system 'You are a helpful assistant.'
--warmup_ratio 0.05
--dataloader_num_workers 4
--deepspeed zero3
--attn_impl flash_attn
--save_only_model True
--max_grad_norm 0.9
--data_seed 42
--use_gpr_lie true
--gpr_lie_lambda 0.01
--gpr_lie_freq 100

echo "Training command finished."数据集格式如下 {
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Here are a visible light image and a thermal image . Detect all <|object_ref_start|>person<|object_ref_end|>.\nFor each object, provide:\n1. location\n2. label\n3. appearance\n4. pose_action\n\nReturn as a JSON array."
},
{
"role": "assistant",
"content": "[\n{\n"bbox_2d": [\n123,\n182,\n150,\n283\n],\n"label": "person",\n"appearance": "A person wearing a light-colored mask, a dark jacket over a red shirt, light-colored pants, and white shoes. They have dark hair.",\n"pose_action": "The person is standing, holding their right hand towards their head."\n},\n{\n"bbox_2d": [\n150,\n167,\n182,\n274\n],\n"label": "person",\n"appearance": "Wearing a white face mask, a light-colored jacket over a dark shirt, and blue jeans. The person is holding a metallic cylindrical object in their raised right hand.",\n"pose_action": "The person is standing and holding an object."\n},\n{\n"bbox_2d": [\n199,\n157,\n235,\n262\n],\n"label": "person",\n"appearance": "A person wearing a patterned jacket or hoodie over a lighter layer, light-colored pants, and light-colored shoes. They have dark hair and are holding an object, possibly a bag or folded garment, in their right hand.",\n"pose_action": "The person is standing and holding an item."\n},\n{\n"bbox_2d": [\n299,\n151,\n325,\n250\n],\n"label": "person",\n"appearance": "A person wearing dark clothing, including a dark jacket or coat and dark pants, with a light-colored mask covering their face. They have dark hair and are wearing light-colored shoes. They appear to be holding a light-colored object in their right hand.",\n"pose_action": "The person is standing."\n},\n{\n"bbox_2d": [\n301,\n220,\n345,\n339\n],\n"label": "person",\n"appearance": "A person, largely cropped by the right edge of the frame, appearing to wear dark clothing. There is a light-colored detail visible on their upper left side.",\n"pose_action": "The person is standing. Their exact action is difficult to determine due to the cropping."\n},\n{\n"bbox_2d": [\n481,\n9,\n511,\n104\n],\n"label": "person",\n"appearance": "A person wearing a long, dark-colored coat or jacket, dark pants or leggings, and light-colored shoes. The person has dark hair.",\n"pose_action": "The person is standing."\n}\n]"
}
],
"images": [
"/home/member/data2/chxm/Multimodal-REC/LLVIP_REF/LLVIP_Data_Augmentation/visible/train/010001.jpg",
"/home/member/data2/chxm/Multimodal-REC/LLVIP_REF/LLVIP_Data_Augmentation/thermal/train/010001.jpg"
]
},`
Additional context
Add any other context about the problem here(在这里补充其他信息)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions