Skip to content

Commit

Permalink
support deepseek vl finetune vision encoder (modelscope#547)
Browse files Browse the repository at this point in the history
  • Loading branch information
Jintao-Huang authored Mar 13, 2024
1 parent a943bd0 commit b1a6895
Show file tree
Hide file tree
Showing 11 changed files with 137 additions and 27 deletions.
2 changes: 1 addition & 1 deletion docs/source/LLM/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ dpo参数继承了sft参数, 除此之外增加了以下参数:
- `--dataset_test_ratio`: 默认值为`0.01`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
- `--val_dataset_sample`: 表示想要评估和展示的验证集的数量, 默认值为`10`.
- `--system`: 默认值为`None`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
- `--max_length`: 默认值为`2048`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
- `--max_length`: 默认值为`-1`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
- `--truncation_strategy`: 默认是`'delete'`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
- `--check_dataset_strategy`: 默认值为`'none'`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
- `--custom_train_dataset_path`: 默认值为`[]`. 具体的含义参考README.md中的`自定义数据集`模块.
Expand Down
13 changes: 12 additions & 1 deletion docs/source/Multi-Modal/cogvlm最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,9 +147,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \


## 微调后推理

直接推理:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/cogvlm-17b-instruct/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```

**merge-lora**并推理:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/cogvlm-17b-instruct/vx-xxx/checkpoint-xxx \
--merge_lora true

CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/cogvlm-17b-instruct/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
15 changes: 13 additions & 2 deletions docs/source/Multi-Modal/deepseek-vl最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ road:
## 微调
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:

(默认只对LLM部分的qkv进行lora微调. 如果你想对LLM部分的所有linear进行微调, 可以指定`--lora_target_modules ALL`. 该模型暂不支持对vision模型部分微调)
(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
```shell
# Experimental environment: A10, 3090, V100
# 20GB GPU memory
Expand All @@ -157,9 +157,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \


## 微调后推理

直接推理:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/deepseek-vl-7b-chat/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```

**merge-lora**并推理:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/deepseek-vl-7b-chat/vx-xxx/checkpoint-xxx \
--merge_lora true

CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/deepseek-vl-7b-chat/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
3 changes: 1 addition & 2 deletions docs/source/Multi-Modal/internlm-xcomposer2最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
(支持多轮对话, 支持每轮对话含多张图片或不含图片, 支持传入本地路径或URL)
(支持多轮对话, 支持每轮对话含多张图片或不含图片, 支持传入本地路径或URL. 该模型不支持merge-lora)
```json
[
Expand All @@ -159,7 +159,6 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
## 微调后推理
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/internlm-xcomposer2-7b-chat/vx-xxx/checkpoint-xxx \
Expand Down
37 changes: 36 additions & 1 deletion docs/source/Multi-Modal/qwen-audio最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,8 @@ history: [('Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/i
## 微调
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:

LoRA微调:

(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含audio模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
```shell
# Experimental environment: A10, 3090, V100...
Expand All @@ -106,6 +108,28 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
--dataset aishell1-mini-zh \
```

全参数微调:
```shell
# MP
# Experimental environment: 2 * A100
# 2 * 50 GPU memory
CUDA_VISIBLE_DEVICES=0,1 swift sft \
--model_type qwen-audio-chat \
--dataset aishell1-mini-zh \
--train_dataset_sample -1 \
--sft_type full \

# ZeRO2
# Experimental environment: 4 * A100
# 2 * 80 GPU memory
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
--model_type qwen-audio-chat \
--dataset aishell1-mini-zh \
--train_dataset_sample -1 \
--sft_type full \
--deepspeed default-zero2
```

[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:

(支持多轮对话, 支持每轮对话含多段语音或不含语音, 支持传入本地路径或URL)
Expand Down Expand Up @@ -133,9 +157,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \


## 微调后推理

直接推理:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```

**merge-lora**并推理:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx \
--merge_lora true

CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
26 changes: 25 additions & 1 deletion docs/source/Multi-Modal/qwen-vl最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,8 @@ road:
## 微调
多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:

LoRA微调:

(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
```shell
# Experimental environment: 3090
Expand All @@ -138,6 +140,17 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
--dataset coco-mini-en \
```

全参数微调:
```shell
# Experimental environment: 2 * A100
# 2 * 55 GPU memory
CUDA_VISIBLE_DEVICES=0,1 swift sft \
--model_type qwen-vl-chat \
--dataset coco-mini-en \
--train_dataset_sample -1 \
--sft_type full \
```

[自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:

(支持多轮对话, 支持每轮对话含多张图片或不含图片, 支持传入本地路径或URL)
Expand Down Expand Up @@ -165,9 +178,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \


## 微调后推理

直接推理:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```

**merge-lora**并推理:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx \
--merge_lora true

CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
13 changes: 12 additions & 1 deletion docs/source/Multi-Modal/yi-vl最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,9 +157,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \


## 微调后推理

直接推理:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
```

**merge-lora**并推理:
```shell
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx \
--merge_lora true

CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true
```
3 changes: 2 additions & 1 deletion swift/llm/utils/argument.py
Original file line number Diff line number Diff line change
Expand Up @@ -588,7 +588,7 @@ class InferArguments:
val_dataset_sample: int = 10 # -1: all dataset
save_result: bool = True
system: Optional[str] = None
max_length: int = 2048 # -1: no limit
max_length: int = -1 # -1: no limit
truncation_strategy: Literal['delete', 'truncation_left'] = 'delete'
check_dataset_strategy: Literal['none', 'discard', 'error',
'warning'] = 'none'
Expand Down Expand Up @@ -958,6 +958,7 @@ def set_model_type(args: Union[SftArguments, InferArguments]) -> None:
if model_id_or_path_lower not in model_mapping_reversed:
if (isinstance(args, InferArguments)
and 'checkpoint' in model_id_or_path
and 'merged' not in model_id_or_path
and args.ckpt_dir is None):
raise ValueError(
'Please use `--ckpt_dir vx-xxx/checkpoint-xxx` to use the checkpoint.'
Expand Down
48 changes: 33 additions & 15 deletions swift/llm/utils/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from modelscope import (AutoConfig, AutoModelForCausalLM, AutoTokenizer,
BitsAndBytesConfig, GenerationConfig, GPTQConfig,
snapshot_download)
from modelscope.hub.utils.utils import get_cache_dir
from packaging import version
from torch import Tensor
from torch import dtype as Dtype
Expand Down Expand Up @@ -1673,11 +1674,17 @@ def get_model_tokenizer_internlm_xcomposer2(model_dir: str,
return model, tokenizer


def _git_clone_github(github_url: str, model_dir: str,
local_repo_name: str) -> str:
git_cache_dir = os.path.dirname(model_dir)
def _git_clone_github(github_url: str,
local_repo_name: Optional[str] = None) -> str:
git_cache_dir = os.path.join(get_cache_dir(), '_github')
os.makedirs(git_cache_dir, exist_ok=True)
if local_repo_name is None:
github_url = github_url.rstrip('/')
local_repo_name = github_url.rsplit('/', 1)[1]
local_repo_path = os.path.join(git_cache_dir, local_repo_name)
if not os.path.exists(local_repo_path):
if not github_url.endswith('.git'):
github_url = f'{github_url}.git'
command = f'git -C {git_cache_dir} clone {github_url} {local_repo_name}'
logger.info(f'Run the command: `{command}`')
os.system(command)
Expand Down Expand Up @@ -1718,6 +1725,19 @@ def __prepare_inputs_embeds(
def _patch_deepseek_vl(model) -> None:
model.prepare_inputs_embeds = MethodType(__prepare_inputs_embeds, model)

def get_new_func(func_name: str):

def new_func(*args, **kwargs):
return getattr(model.language_model, func_name)(*args, **kwargs)

return new_func

for key in [
'generate', 'get_input_embeddings',
'gradient_checkpointing_enable', 'forward'
]:
setattr(model, key, get_new_func(key))


@register_model(
ModelType.deepseek_vl_7b_chat,
Expand Down Expand Up @@ -1746,8 +1766,7 @@ def get_model_tokenizer_deepseek_vl(model_dir: str,
setattr(collections, type_name, getattr(collections.abc,
type_name))
local_repo_path = _git_clone_github(
'https://github.com/deepseek-ai/DeepSeek-VL', model_dir,
'deepseek_vl_github')
'https://github.com/deepseek-ai/DeepSeek-VL')
sys.path.append(os.path.join(local_repo_path))
from deepseek_vl.models import VLChatProcessor, MultiModalityCausalLM
vl_chat_processor = VLChatProcessor.from_pretrained(model_dir)
Expand All @@ -1772,10 +1791,6 @@ def get_model_tokenizer_deepseek_vl(model_dir: str,
tokenizer.vl_chat_processor = vl_chat_processor
if load_model:
_patch_deepseek_vl(model)
multi_modal_model = model
model = multi_modal_model.language_model
model.multi_modal_model = [multi_modal_model
] # avoid recursion error: use list
return model, tokenizer


Expand Down Expand Up @@ -2486,8 +2501,7 @@ def get_model_tokenizer_yi_vl(model_dir: str,
model_kwargs: Dict[str, Any],
load_model: bool = True,
**kwargs):
local_repo_path = _git_clone_github('https://github.com/01-ai/Yi.git',
model_dir, 'yi_github')
local_repo_path = _git_clone_github('https://github.com/01-ai/Yi')
sys.path.append(os.path.join(local_repo_path, 'VL'))
from llava.model import LlavaLlamaForCausalLM, LlavaConfig
from llava.model.constants import key_info
Expand Down Expand Up @@ -2721,10 +2735,14 @@ def get_model_tokenizer(


def get_additional_saved_files(model_type: str) -> List[str]:
if 'qwen-vl' in model_type:
return ['SimSun.ttf']
elif 'qwen-audio' in model_type:
return ['mel_filters.npz']
files_mapping = {
'qwen-vl': ['SimSun.ttf'],
'qwen-audio': ['mel_filters.npz'],
'deepseek-vl': ['preprocessor_config.json']
}
for key, files_list in files_mapping.items():
if key in model_type:
return files_list
return []


Expand Down
2 changes: 1 addition & 1 deletion swift/llm/utils/template.py
Original file line number Diff line number Diff line change
Expand Up @@ -949,7 +949,7 @@ def encode(
pixel_values=images_outputs.pixel_values,
num_image_tokens=num_image_tokens)
batched_output = vl_chat_processor.batchify([output])
model = self.model.multi_modal_model[0]
model = self.model
batched_output = batched_output.to(
device=model.device, dtype=model.dtype)
inputs_embeds = model.prepare_inputs_embeds(**batched_output)[0]
Expand Down
2 changes: 1 addition & 1 deletion tests/llm/test_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def test_loss_matching(self):
'--max_new_tokens', '100', '--use_flash_attn', 'true',
'--lora_target_modules', 'ALL', '--seed', '0',
'--lora_bias_trainable', 'all', '--lora_modules_to_save',
'wte', 'ln_1', 'ln_2', 'ln_f', 'lm_head'
'EMBEDDING', 'LN', 'lm_head'
])
best_model_checkpoint = output['best_model_checkpoint']
print(f'best_model_checkpoint: {best_model_checkpoint}')
Expand Down

0 comments on commit b1a6895

Please sign in to comment.