support deepseek vl finetune vision encoder (modelscope#547)

Jintao-Huang · web-flow · commit b1a68958a4e7 · 2024-03-14T06:20:17.000+08:00
diff --git a/docs/source/LLM/命令行参数.md b/docs/source/LLM/命令行参数.md
@@ -175,7 +175,7 @@ dpo参数继承了sft参数, 除此之外增加了以下参数:
 - `--dataset_test_ratio`: 默认值为`0.01`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--val_dataset_sample`: 表示想要评估和展示的验证集的数量, 默认值为`10`.
 - `--system`: 默认值为`None`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
-- `--max_length`: 默认值为`2048`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
+- `--max_length`: 默认值为`-1`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--truncation_strategy`: 默认是`'delete'`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--check_dataset_strategy`: 默认值为`'none'`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--custom_train_dataset_path`: 默认值为`[]`. 具体的含义参考README.md中的`自定义数据集`模块.
diff --git a/docs/source/Multi-Modal/cogvlm最佳实践.md b/docs/source/Multi-Modal/cogvlm最佳实践.md
@@ -147,9 +147,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
 
 
 ## 微调后推理
-
+直接推理:
 ```shell
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --ckpt_dir output/cogvlm-17b-instruct/vx-xxx/checkpoint-xxx \
     --load_dataset_config true \
 ```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/cogvlm-17b-instruct/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/cogvlm-17b-instruct/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
diff --git a/docs/source/Multi-Modal/deepseek-vl最佳实践.md b/docs/source/Multi-Modal/deepseek-vl最佳实践.md
@@ -136,7 +136,7 @@ road:
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
-(默认只对LLM部分的qkv进行lora微调. 如果你想对LLM部分的所有linear进行微调, 可以指定`--lora_target_modules ALL`. 该模型暂不支持对vision模型部分微调)
+(默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A10, 3090, V100
 # 20GB GPU memory
@@ -157,9 +157,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
 
 
 ## 微调后推理
-
+直接推理:
 ```shell
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --ckpt_dir output/deepseek-vl-7b-chat/vx-xxx/checkpoint-xxx \
     --load_dataset_config true \
 ```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/deepseek-vl-7b-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/deepseek-vl-7b-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
diff --git a/docs/source/Multi-Modal/internlm-xcomposer2最佳实践.md b/docs/source/Multi-Modal/internlm-xcomposer2最佳实践.md
@@ -134,7 +134,7 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
 
 [自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
 
-(支持多轮对话, 支持每轮对话含多张图片或不含图片, 支持传入本地路径或URL)
+(支持多轮对话, 支持每轮对话含多张图片或不含图片, 支持传入本地路径或URL. 该模型不支持merge-lora)
 
 ```json
 [
@@ -159,7 +159,6 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
 
 
 ## 微调后推理
-
 ```shell
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --ckpt_dir output/internlm-xcomposer2-7b-chat/vx-xxx/checkpoint-xxx \
diff --git a/docs/source/Multi-Modal/qwen-audio最佳实践.md b/docs/source/Multi-Modal/qwen-audio最佳实践.md
@@ -97,6 +97,8 @@ history: [('Audio 1:<audio>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/i
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
+LoRA微调:
+
 (默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含audio模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
 ```shell
 # Experimental environment: A10, 3090, V100...
@@ -106,6 +108,28 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
     --dataset aishell1-mini-zh \
 ```
 
+全参数微调:
+```shell
+# MP
+# Experimental environment: 2 * A100
+# 2 * 50 GPU memory
+CUDA_VISIBLE_DEVICES=0,1 swift sft \
+    --model_type qwen-audio-chat \
+    --dataset aishell1-mini-zh \
+    --train_dataset_sample -1 \
+    --sft_type full \
+
+# ZeRO2
+# Experimental environment: 4 * A100
+# 2 * 80 GPU memory
+NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
+    --model_type qwen-audio-chat \
+    --dataset aishell1-mini-zh \
+    --train_dataset_sample -1 \
+    --sft_type full \
+    --deepspeed default-zero2
+```
+
 [自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
 
 (支持多轮对话, 支持每轮对话含多段语音或不含语音, 支持传入本地路径或URL)
@@ -133,9 +157,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
 
 
 ## 微调后推理
-
+直接推理:
 ```shell
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx \
     --load_dataset_config true \
 ```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/qwen-audio-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
diff --git a/docs/source/Multi-Modal/qwen-vl最佳实践.md b/docs/source/Multi-Modal/qwen-vl最佳实践.md
@@ -129,6 +129,8 @@ road:
 ## 微调
 多模态大模型微调通常使用**自定义数据集**进行微调. 这里展示可直接运行的demo:
 
+LoRA微调:
+
 (默认只对LLM部分的qkv进行lora微调. 如果你想对所有linear含vision模型部分都进行微调, 可以指定`--lora_target_modules ALL`)
 ```shell
 # Experimental environment: 3090
@@ -138,6 +140,17 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
     --dataset coco-mini-en \
 ```
 
+全参数微调:
+```shell
+# Experimental environment: 2 * A100
+# 2 * 55 GPU memory
+CUDA_VISIBLE_DEVICES=0,1 swift sft \
+    --model_type qwen-vl-chat \
+    --dataset coco-mini-en \
+    --train_dataset_sample -1 \
+    --sft_type full \
+```
+
 [自定义数据集](../LLM/自定义与拓展.md#-推荐命令行参数的形式)支持json, jsonl样式, 以下是自定义数据集的例子:
 
 (支持多轮对话, 支持每轮对话含多张图片或不含图片, 支持传入本地路径或URL)
@@ -165,9 +178,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
 
 
 ## 微调后推理
-
+直接推理:
 ```shell
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx \
     --load_dataset_config true \
 ```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
diff --git a/docs/source/Multi-Modal/yi-vl最佳实践.md b/docs/source/Multi-Modal/yi-vl最佳实践.md
@@ -157,9 +157,20 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
 
 
 ## 微调后推理
-
+直接推理:
 ```shell
 CUDA_VISIBLE_DEVICES=0 swift infer \
     --ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx \
     --load_dataset_config true \
 ```
+
+**merge-lora**并推理:
+```shell
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx \
+    --merge_lora true
+
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx-merged \
+    --load_dataset_config true
+```
diff --git a/swift/llm/utils/argument.py b/swift/llm/utils/argument.py
@@ -588,7 +588,7 @@ class InferArguments:
     val_dataset_sample: int = 10  # -1: all dataset
     save_result: bool = True
     system: Optional[str] = None
-    max_length: int = 2048  # -1: no limit
+    max_length: int = -1  # -1: no limit
     truncation_strategy: Literal['delete', 'truncation_left'] = 'delete'
     check_dataset_strategy: Literal['none', 'discard', 'error',
                                     'warning'] = 'none'
@@ -958,6 +958,7 @@ def set_model_type(args: Union[SftArguments, InferArguments]) -> None:
         if model_id_or_path_lower not in model_mapping_reversed:
             if (isinstance(args, InferArguments)
                     and 'checkpoint' in model_id_or_path
+                    and 'merged' not in model_id_or_path
                     and args.ckpt_dir is None):
                 raise ValueError(
                     'Please use `--ckpt_dir vx-xxx/checkpoint-xxx` to use the checkpoint.'
diff --git a/swift/llm/utils/model.py b/swift/llm/utils/model.py
@@ -15,6 +15,7 @@
 from modelscope import (AutoConfig, AutoModelForCausalLM, AutoTokenizer,
                         BitsAndBytesConfig, GenerationConfig, GPTQConfig,
                         snapshot_download)
+from modelscope.hub.utils.utils import get_cache_dir
 from packaging import version
 from torch import Tensor
 from torch import dtype as Dtype
@@ -1673,11 +1674,17 @@ def get_model_tokenizer_internlm_xcomposer2(model_dir: str,
     return model, tokenizer
 
 
-def _git_clone_github(github_url: str, model_dir: str,
-                      local_repo_name: str) -> str:
-    git_cache_dir = os.path.dirname(model_dir)
+def _git_clone_github(github_url: str,
+                      local_repo_name: Optional[str] = None) -> str:
+    git_cache_dir = os.path.join(get_cache_dir(), '_github')
+    os.makedirs(git_cache_dir, exist_ok=True)
+    if local_repo_name is None:
+        github_url = github_url.rstrip('/')
+        local_repo_name = github_url.rsplit('/', 1)[1]
     local_repo_path = os.path.join(git_cache_dir, local_repo_name)
     if not os.path.exists(local_repo_path):
+        if not github_url.endswith('.git'):
+            github_url = f'{github_url}.git'
         command = f'git -C {git_cache_dir} clone {github_url} {local_repo_name}'
         logger.info(f'Run the command: `{command}`')
         os.system(command)
@@ -1718,6 +1725,19 @@ def __prepare_inputs_embeds(
 def _patch_deepseek_vl(model) -> None:
     model.prepare_inputs_embeds = MethodType(__prepare_inputs_embeds, model)
 
+    def get_new_func(func_name: str):
+
+        def new_func(*args, **kwargs):
+            return getattr(model.language_model, func_name)(*args, **kwargs)
+
+        return new_func
+
+    for key in [
+            'generate', 'get_input_embeddings',
+            'gradient_checkpointing_enable', 'forward'
+    ]:
+        setattr(model, key, get_new_func(key))
+
 
 @register_model(
     ModelType.deepseek_vl_7b_chat,
@@ -1746,8 +1766,7 @@ def get_model_tokenizer_deepseek_vl(model_dir: str,
             setattr(collections, type_name, getattr(collections.abc,
                                                     type_name))
     local_repo_path = _git_clone_github(
-        'https://github.com/deepseek-ai/DeepSeek-VL', model_dir,
-        'deepseek_vl_github')
+        'https://github.com/deepseek-ai/DeepSeek-VL')
     sys.path.append(os.path.join(local_repo_path))
     from deepseek_vl.models import VLChatProcessor, MultiModalityCausalLM
     vl_chat_processor = VLChatProcessor.from_pretrained(model_dir)
@@ -1772,10 +1791,6 @@ def get_model_tokenizer_deepseek_vl(model_dir: str,
     tokenizer.vl_chat_processor = vl_chat_processor
     if load_model:
         _patch_deepseek_vl(model)
-        multi_modal_model = model
-        model = multi_modal_model.language_model
-        model.multi_modal_model = [multi_modal_model
-                                   ]  # avoid recursion error: use list
     return model, tokenizer
 
 
@@ -2486,8 +2501,7 @@ def get_model_tokenizer_yi_vl(model_dir: str,
                               model_kwargs: Dict[str, Any],
                               load_model: bool = True,
                               **kwargs):
-    local_repo_path = _git_clone_github('https://github.com/01-ai/Yi.git',
-                                        model_dir, 'yi_github')
+    local_repo_path = _git_clone_github('https://github.com/01-ai/Yi')
     sys.path.append(os.path.join(local_repo_path, 'VL'))
     from llava.model import LlavaLlamaForCausalLM, LlavaConfig
     from llava.model.constants import key_info
@@ -2721,10 +2735,14 @@ def get_model_tokenizer(
 
 
 def get_additional_saved_files(model_type: str) -> List[str]:
-    if 'qwen-vl' in model_type:
-        return ['SimSun.ttf']
-    elif 'qwen-audio' in model_type:
-        return ['mel_filters.npz']
+    files_mapping = {
+        'qwen-vl': ['SimSun.ttf'],
+        'qwen-audio': ['mel_filters.npz'],
+        'deepseek-vl': ['preprocessor_config.json']
+    }
+    for key, files_list in files_mapping.items():
+        if key in model_type:
+            return files_list
     return []
 
 
diff --git a/swift/llm/utils/template.py b/swift/llm/utils/template.py
@@ -949,7 +949,7 @@ def encode(
             pixel_values=images_outputs.pixel_values,
             num_image_tokens=num_image_tokens)
         batched_output = vl_chat_processor.batchify([output])
-        model = self.model.multi_modal_model[0]
+        model = self.model
         batched_output = batched_output.to(
             device=model.device, dtype=model.dtype)
         inputs_embeds = model.prepare_inputs_embeds(**batched_output)[0]
diff --git a/tests/llm/test_run.py b/tests/llm/test_run.py
@@ -108,7 +108,7 @@ def test_loss_matching(self):
                 '--max_new_tokens', '100', '--use_flash_attn', 'true',
                 '--lora_target_modules', 'ALL', '--seed', '0',
                 '--lora_bias_trainable', 'all', '--lora_modules_to_save',
-                'wte', 'ln_1', 'ln_2', 'ln_f', 'lm_head'
+                'EMBEDDING', 'LN', 'lm_head'
             ])
             best_model_checkpoint = output['best_model_checkpoint']
             print(f'best_model_checkpoint: {best_model_checkpoint}')