Skip to content

Commit b1a6895

Browse files
authored
support deepseek vl finetune vision encoder (modelscope#547)
1 parent a943bd0 commit b1a6895

File tree

11 files changed

+137
-27
lines changed

11 files changed

+137
-27
lines changed

docs/source/LLM/命令行参数.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ dpo参数继承了sft参数, 除此之外增加了以下参数:
175175
- `--dataset_test_ratio`: 默认值为`0.01`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
176176
- `--val_dataset_sample`: 表示想要评估和展示的验证集的数量, 默认值为`10`.
177177
- `--system`: 默认值为`None`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
178-
- `--max_length`: 默认值为`2048`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
178+
- `--max_length`: 默认值为`-1`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
179179
- `--truncation_strategy`: 默认是`'delete'`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
180180
- `--check_dataset_strategy`: 默认值为`'none'`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
181181
- `--custom_train_dataset_path`: 默认值为`[]`. 具体的含义参考README.md中的`自定义数据集`模块.

docs/source/Multi-Modal/cogvlm最佳实践.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,9 +147,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
147147

148148

149149
## 微调后推理
150-
150+
直接推理:
151151
```shell
152152
CUDA_VISIBLE_DEVICES=0 swift infer \
153153
--ckpt_dir output/cogvlm-17b-instruct/vx-xxx/checkpoint-xxx \
154154
--load_dataset_config true \
155155
```
156+
157+
**merge-lora**并推理:
158+
```shell
159+
CUDA_VISIBLE_DEVICES=0 swift export \
160+
--ckpt_dir output/cogvlm-17b-instruct/vx-xxx/checkpoint-xxx \
161+
--merge_lora true
162+
163+
CUDA_VISIBLE_DEVICES=0 swift infer \
164+
--ckpt_dir output/cogvlm-17b-instruct/vx-xxx/checkpoint-xxx-merged \
165+
--load_dataset_config true
166+
```

docs/source/Multi-Modal/deepseek-vl最佳实践.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ road:
136136
## 微调
137137
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
138138

139-
(默认只对LLM部分的qkv进行lora微调. 如果你想对LLM部分的所有linear进行微调, 可以指定`--lora_target_modules ALL`. 该模型暂不支持对vision模型部分微调)
139+
(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
140140
```shell
141141
# Experimental environment: A10, 3090, V100
142142
# 20GB GPU memory
@@ -157,9 +157,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
157157

158158

159159
## 微调后推理
160-
160+
直接推理:
161161
```shell
162162
CUDA_VISIBLE_DEVICES=0 swift infer \
163163
--ckpt_dir output/deepseek-vl-7b-chat/vx-xxx/checkpoint-xxx \
164164
--load_dataset_config true \
165165
```
166+
167+
**merge-lora**并推理:
168+
```shell
169+
CUDA_VISIBLE_DEVICES=0 swift export \
170+
--ckpt_dir output/deepseek-vl-7b-chat/vx-xxx/checkpoint-xxx \
171+
--merge_lora true
172+
173+
CUDA_VISIBLE_DEVICES=0 swift infer \
174+
--ckpt_dir output/deepseek-vl-7b-chat/vx-xxx/checkpoint-xxx-merged \
175+
--load_dataset_config true
176+
```

docs/source/Multi-Modal/internlm-xcomposer2最佳实践.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
134134
135135
[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
136136
137-
(支持多轮对话, 支持每轮对话含多张图片或不含图片, 支持传入本地路径或URL)
137+
(支持多轮对话, 支持每轮对话含多张图片或不含图片, 支持传入本地路径或URL. 该模型不支持merge-lora)
138138
139139
```json
140140
[
@@ -159,7 +159,6 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
159159
160160
161161
## 微调后推理
162-
163162
```shell
164163
CUDA_VISIBLE_DEVICES=0 swift infer \
165164
--ckpt_dir output/internlm-xcomposer2-7b-chat/vx-xxx/checkpoint-xxx \

docs/source/Multi-Modal/qwen-audio最佳实践.md

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,8 @@ history: [('Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/i
9797
## 微调
9898
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
9999

100+
LoRA微调:
101+
100102
(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含audio模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
101103
```shell
102104
# Experimental environment: A10, 3090, V100...
@@ -106,6 +108,28 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
106108
--dataset aishell1-mini-zh \
107109
```
108110

111+
全参数微调:
112+
```shell
113+
# MP
114+
# Experimental environment: 2 * A100
115+
# 2 * 50 GPU memory
116+
CUDA_VISIBLE_DEVICES=0,1 swift sft \
117+
--model_type qwen-audio-chat \
118+
--dataset aishell1-mini-zh \
119+
--train_dataset_sample -1 \
120+
--sft_type full \
121+
122+
# ZeRO2
123+
# Experimental environment: 4 * A100
124+
# 2 * 80 GPU memory
125+
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
126+
--model_type qwen-audio-chat \
127+
--dataset aishell1-mini-zh \
128+
--train_dataset_sample -1 \
129+
--sft_type full \
130+
--deepspeed default-zero2
131+
```
132+
109133
[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
110134

111135
(支持多轮对话, 支持每轮对话含多段语音或不含语音, 支持传入本地路径或URL)
@@ -133,9 +157,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
133157

134158

135159
## 微调后推理
136-
160+
直接推理:
137161
```shell
138162
CUDA_VISIBLE_DEVICES=0 swift infer \
139163
--ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx \
140164
--load_dataset_config true \
141165
```
166+
167+
**merge-lora**并推理:
168+
```shell
169+
CUDA_VISIBLE_DEVICES=0 swift export \
170+
--ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx \
171+
--merge_lora true
172+
173+
CUDA_VISIBLE_DEVICES=0 swift infer \
174+
--ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx-merged \
175+
--load_dataset_config true
176+
```

docs/source/Multi-Modal/qwen-vl最佳实践.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,8 @@ road:
129129
## 微调
130130
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
131131

132+
LoRA微调:
133+
132134
(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
133135
```shell
134136
# Experimental environment: 3090
@@ -138,6 +140,17 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
138140
--dataset coco-mini-en \
139141
```
140142

143+
全参数微调:
144+
```shell
145+
# Experimental environment: 2 * A100
146+
# 2 * 55 GPU memory
147+
CUDA_VISIBLE_DEVICES=0,1 swift sft \
148+
--model_type qwen-vl-chat \
149+
--dataset coco-mini-en \
150+
--train_dataset_sample -1 \
151+
--sft_type full \
152+
```
153+
141154
[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
142155

143156
(支持多轮对话, 支持每轮对话含多张图片或不含图片, 支持传入本地路径或URL)
@@ -165,9 +178,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
165178

166179

167180
## 微调后推理
168-
181+
直接推理:
169182
```shell
170183
CUDA_VISIBLE_DEVICES=0 swift infer \
171184
--ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx \
172185
--load_dataset_config true \
173186
```
187+
188+
**merge-lora**并推理:
189+
```shell
190+
CUDA_VISIBLE_DEVICES=0 swift export \
191+
--ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx \
192+
--merge_lora true
193+
194+
CUDA_VISIBLE_DEVICES=0 swift infer \
195+
--ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx-merged \
196+
--load_dataset_config true
197+
```

docs/source/Multi-Modal/yi-vl最佳实践.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,9 +157,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
157157

158158

159159
## 微调后推理
160-
160+
直接推理:
161161
```shell
162162
CUDA_VISIBLE_DEVICES=0 swift infer \
163163
--ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx \
164164
--load_dataset_config true \
165165
```
166+
167+
**merge-lora**并推理:
168+
```shell
169+
CUDA_VISIBLE_DEVICES=0 swift export \
170+
--ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx \
171+
--merge_lora true
172+
173+
CUDA_VISIBLE_DEVICES=0 swift infer \
174+
--ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx-merged \
175+
--load_dataset_config true
176+
```

swift/llm/utils/argument.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -588,7 +588,7 @@ class InferArguments:
588588
val_dataset_sample: int = 10 # -1: all dataset
589589
save_result: bool = True
590590
system: Optional[str] = None
591-
max_length: int = 2048 # -1: no limit
591+
max_length: int = -1 # -1: no limit
592592
truncation_strategy: Literal['delete', 'truncation_left'] = 'delete'
593593
check_dataset_strategy: Literal['none', 'discard', 'error',
594594
'warning'] = 'none'
@@ -958,6 +958,7 @@ def set_model_type(args: Union[SftArguments, InferArguments]) -> None:
958958
if model_id_or_path_lower not in model_mapping_reversed:
959959
if (isinstance(args, InferArguments)
960960
and 'checkpoint' in model_id_or_path
961+
and 'merged' not in model_id_or_path
961962
and args.ckpt_dir is None):
962963
raise ValueError(
963964
'Please use `--ckpt_dir vx-xxx/checkpoint-xxx` to use the checkpoint.'

swift/llm/utils/model.py

Lines changed: 33 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
from modelscope import (AutoConfig, AutoModelForCausalLM, AutoTokenizer,
1616
BitsAndBytesConfig, GenerationConfig, GPTQConfig,
1717
snapshot_download)
18+
from modelscope.hub.utils.utils import get_cache_dir
1819
from packaging import version
1920
from torch import Tensor
2021
from torch import dtype as Dtype
@@ -1673,11 +1674,17 @@ def get_model_tokenizer_internlm_xcomposer2(model_dir: str,
16731674
return model, tokenizer
16741675

16751676

1676-
def _git_clone_github(github_url: str, model_dir: str,
1677-
local_repo_name: str) -> str:
1678-
git_cache_dir = os.path.dirname(model_dir)
1677+
def _git_clone_github(github_url: str,
1678+
local_repo_name: Optional[str] = None) -> str:
1679+
git_cache_dir = os.path.join(get_cache_dir(), '_github')
1680+
os.makedirs(git_cache_dir, exist_ok=True)
1681+
if local_repo_name is None:
1682+
github_url = github_url.rstrip('/')
1683+
local_repo_name = github_url.rsplit('/', 1)[1]
16791684
local_repo_path = os.path.join(git_cache_dir, local_repo_name)
16801685
if not os.path.exists(local_repo_path):
1686+
if not github_url.endswith('.git'):
1687+
github_url = f'{github_url}.git'
16811688
command = f'git -C {git_cache_dir} clone {github_url} {local_repo_name}'
16821689
logger.info(f'Run the command: `{command}`')
16831690
os.system(command)
@@ -1718,6 +1725,19 @@ def __prepare_inputs_embeds(
17181725
def _patch_deepseek_vl(model) -> None:
17191726
model.prepare_inputs_embeds = MethodType(__prepare_inputs_embeds, model)
17201727

1728+
def get_new_func(func_name: str):
1729+
1730+
def new_func(*args, **kwargs):
1731+
return getattr(model.language_model, func_name)(*args, **kwargs)
1732+
1733+
return new_func
1734+
1735+
for key in [
1736+
'generate', 'get_input_embeddings',
1737+
'gradient_checkpointing_enable', 'forward'
1738+
]:
1739+
setattr(model, key, get_new_func(key))
1740+
17211741

17221742
@register_model(
17231743
ModelType.deepseek_vl_7b_chat,
@@ -1746,8 +1766,7 @@ def get_model_tokenizer_deepseek_vl(model_dir: str,
17461766
setattr(collections, type_name, getattr(collections.abc,
17471767
type_name))
17481768
local_repo_path = _git_clone_github(
1749-
'https://github.com/deepseek-ai/DeepSeek-VL', model_dir,
1750-
'deepseek_vl_github')
1769+
'https://github.com/deepseek-ai/DeepSeek-VL')
17511770
sys.path.append(os.path.join(local_repo_path))
17521771
from deepseek_vl.models import VLChatProcessor, MultiModalityCausalLM
17531772
vl_chat_processor = VLChatProcessor.from_pretrained(model_dir)
@@ -1772,10 +1791,6 @@ def get_model_tokenizer_deepseek_vl(model_dir: str,
17721791
tokenizer.vl_chat_processor = vl_chat_processor
17731792
if load_model:
17741793
_patch_deepseek_vl(model)
1775-
multi_modal_model = model
1776-
model = multi_modal_model.language_model
1777-
model.multi_modal_model = [multi_modal_model
1778-
] # avoid recursion error: use list
17791794
return model, tokenizer
17801795

17811796

@@ -2486,8 +2501,7 @@ def get_model_tokenizer_yi_vl(model_dir: str,
24862501
model_kwargs: Dict[str, Any],
24872502
load_model: bool = True,
24882503
**kwargs):
2489-
local_repo_path = _git_clone_github('https://github.com/01-ai/Yi.git',
2490-
model_dir, 'yi_github')
2504+
local_repo_path = _git_clone_github('https://github.com/01-ai/Yi')
24912505
sys.path.append(os.path.join(local_repo_path, 'VL'))
24922506
from llava.model import LlavaLlamaForCausalLM, LlavaConfig
24932507
from llava.model.constants import key_info
@@ -2721,10 +2735,14 @@ def get_model_tokenizer(
27212735

27222736

27232737
def get_additional_saved_files(model_type: str) -> List[str]:
2724-
if 'qwen-vl' in model_type:
2725-
return ['SimSun.ttf']
2726-
elif 'qwen-audio' in model_type:
2727-
return ['mel_filters.npz']
2738+
files_mapping = {
2739+
'qwen-vl': ['SimSun.ttf'],
2740+
'qwen-audio': ['mel_filters.npz'],
2741+
'deepseek-vl': ['preprocessor_config.json']
2742+
}
2743+
for key, files_list in files_mapping.items():
2744+
if key in model_type:
2745+
return files_list
27282746
return []
27292747

27302748

swift/llm/utils/template.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -949,7 +949,7 @@ def encode(
949949
pixel_values=images_outputs.pixel_values,
950950
num_image_tokens=num_image_tokens)
951951
batched_output = vl_chat_processor.batchify([output])
952-
model = self.model.multi_modal_model[0]
952+
model = self.model
953953
batched_output = batched_output.to(
954954
device=model.device, dtype=model.dtype)
955955
inputs_embeds = model.prepare_inputs_embeds(**batched_output)[0]

tests/llm/test_run.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ def test_loss_matching(self):
108108
'--max_new_tokens', '100', '--use_flash_attn', 'true',
109109
'--lora_target_modules', 'ALL', '--seed', '0',
110110
'--lora_bias_trainable', 'all', '--lora_modules_to_save',
111-
'wte', 'ln_1', 'ln_2', 'ln_f', 'lm_head'
111+
'EMBEDDING', 'LN', 'lm_head'
112112
])
113113
best_model_checkpoint = output['best_model_checkpoint']
114114
print(f'best_model_checkpoint: {best_model_checkpoint}')

0 commit comments

Comments
 (0)