tastelikefeet
diff --git a/‎README.md
Lines changed: 12 additions & 4 deletions b/‎README.md
Lines changed: 12 additions & 4 deletions
diff --git a/‎README_CN.md
Lines changed: 10 additions & 3 deletions b/‎README_CN.md
Lines changed: 10 additions & 3 deletions
diff --git a/‎docs/resources/web-ui-en.jpg
208 KB b/‎docs/resources/web-ui-en.jpg
208 KB
diff --git a/‎docs/source/LLM/命令行参数.md
Lines changed: 7 additions & 6 deletions b/‎docs/source/LLM/命令行参数.md
Lines changed: 7 additions & 6 deletions
diff --git a/‎docs/source/LLM/支持的模型和数据集.md
Lines changed: 4 additions & 0 deletions b/‎docs/source/LLM/支持的模型和数据集.md
Lines changed: 4 additions & 0 deletions
@@ -33,7 +33,7 @@
 - [Documentation](#-documentation)
 - [License](#-License)
 - [Citation](#-citation)
-- [Contact Us](#-contact-us)
+- [WeChat Group](#-Wechat-Group)
 
 ## 📝 Introduction
 SWIFT supports training, inference, evaluation and deployment of nearly **200 LLMs and MLLMs** (multimodal large models). Developers can directly apply our framework to their own research and production environments to realize the complete workflow from model training and evaluation to application. In addition to supporting the lightweight training solutions provided by [PEFT](https://github.com/huggingface/peft), we also provide a complete **Adapters library** to support the latest training techniques such as NEFTune, LoRA+, LLaMA-PRO, etc. This adapter library can be used directly in your own custom workflow without our training scripts.
@@ -44,7 +44,11 @@ Additionally, we are expanding capabilities for other modalities. Currently, we
 
 SWIFT has rich documentations for users, please check [here](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM).
 
+SWIFT web-ui is available both on [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope studio](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary), please feel free to try!
+
 ## 🎉 News
+- 2024.05.22: Supports DeepSeek-V2-Lite series models, model_type are `deepseek-v2-lite` and `deepseek-v2-lite-chat`
+- 2024.05.22: Supports TeleChat-12B-v2 model with quantized version, model_type are `telechat-12b-v2` and `telechat-12b-v2-gptq-int4`
 - 🔥2024.05.21: Inference and fine-tuning support for MiniCPM-Llama3-V-2_5 are now available. For more details, please refer to [minicpm-v-2.5 Best Practice](docs/source/Multi-Modal/minicpm-v-2.5最佳实践.md).
 - 🔥2024.05.20: Support for inferencing and fine-tuning cogvlm2-llama3-chinese-chat-19B, cogvlm2-llama3-chat-19B. you can refer to [cogvlm2 Best Practice](docs/source_en/Multi-Modal/cogvlm2-best-practice.md).
 - 🔥2024.05.17: Support peft=0.11.0. Meanwhile support 3 new tuners: `BOFT`, `Vera` and `Pissa`. use `--sft_type boft/vera` to use BOFT or Vera, use `--init_lora_weights pissa` with `--sft_type lora` to use Pissa.
@@ -198,10 +202,14 @@ This section introduces basic usage, see the [Documentation](#-documentation) se
 
 ### Web-UI
 
+Web-UI is a gradio-based interface for **zero-threshold** training and deployment. It is easy to use and perfectly supports multi-GPU training and deployment:
+
 ```shell
-swift web-ui
+SWIFT_UI_LANG=en swift web-ui
 ```
 
+![image.png](./docs/resources/web-ui-en.jpg)
+
 ### Training
 
 #### Training Scripts
@@ -482,7 +490,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
 | Mistral<br>Mixtral                            | [Mistral series models](https://github.com/mistralai/mistral-src)       | English            | 7B-22B     | base model<br>instruct model<br>MoE model                     |
 | Yi<br>Yi1.5                                      | [01AI's YI series models](https://github.com/01-ai)                     | Chinese<br>English    | 6B-34B<br>including quantized             | base model<br>chat model<br>long text model            |
 | InternLM<br>InternLM2<br>InternLM2-Math              | [Pujiang AI Lab InternLM series models](https://github.com/InternLM/InternLM) | Chinese<br>English | 1.8B-20B                            | base model<br>chat model<br>math model            |
-| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math          | [DeepSeek series models](https://github.com/deepseek-ai)       | Chinese<br>English    | 1.3B-236B                               | base model<br>chat model<br>MoE model<br>code model<br>math model |
+| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math<br>DeepSeek-V2          | [DeepSeek series models](https://github.com/deepseek-ai)       | Chinese<br>English    | 1.3B-236B                               | base model<br>chat model<br>MoE model<br>code model<br>math model |
 | MAMBA                                          | [MAMBA temporal convolution model](https://github.com/state-spaces/mamba) | English          | 130M-2.8B                              | base model                                 |
 | Gemma                                          | [Google Gemma series models](https://github.com/google/gemma_pytorch)   | English            | 2B-7B                                  | base model<br>instruct model                       |
 | MiniCPM                                        | [OpenBmB MiniCPM series models](https://github.com/OpenBMB/MiniCPM)     | Chinese<br>English    | 2B-3B                                  | chat model<br>MoE model                                 |
@@ -656,7 +664,7 @@ This framework is licensed under the [Apache License (Version 2.0)](https://gith
 }
 ```
 
-## ☎ Contact Us
+## ☎ Wechat Group
 
 You can contact us and communicate with us by adding our WeChat group:
 
 
@@ -34,7 +34,7 @@
 - [文档](#-文档)
 - [License](#-license)
 - [引用](#-引用)
-- [联系我们](#-联系我们)
+- [微信用户群](#-微信用户群)
 
 ## 📝 简介
 SWIFT支持近**200种LLM和MLLM**（多模态大模型）的训练、推理、评测和部署。开发者可以直接将我们的框架应用到自己的Research和生产环境中，实现模型训练评测到应用的完整链路。我们除支持了[PEFT](https://github.com/huggingface/peft)提供的轻量训练方案外，也提供了一个完整的**Adapters库**以支持最新的训练技术，如NEFTune、LoRA+、LLaMA-PRO等，这个适配器库可以脱离训练脚本直接使用在自己的自定流程中。
@@ -45,7 +45,11 @@ SWIFT支持近**200种LLM和MLLM**（多模态大模型）的训练、推理、
 
 SWIFT具有丰富的文档体系，如有使用问题请请查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM).
 
+可以在[Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) 和 [ModelScope创空间](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary) 中体验SWIFT web-ui功能了。
+
 ## 🎉 新闻
+- 2024.05.22: 支持DeepSeek-V2-lite系列模型, model_type为 `deepseek-v2-lite`和`deekseek-v2-lite-chat`
+- 2024.05.22: 支持TeleChat-12b-v2模型和量化版本, model_type为 `telechat-12b-v2`和`telechat-12b-v2-gptq-int4`
 - 🔥2024.05.21: 支持 MiniCPM-Llama3-V-2_5 的推理与微调, 可以查看[minicpm-v-2.5最佳实践](docs/source/Multi-Modal/minicpm-v-2.5最佳实践.md).
 - 🔥2024.05.20: 支持 cogvlm2-llama3-chinese-chat-19B, cogvlm2-llama3-chat-19B 的推理与微调, 可以查看[cogvlm2最佳实践](docs/source/Multi-Modal/cogvlm2最佳实践.md).
 - 🔥2024.05.17: 支持peft=0.11.0. 同时支持了三个新的tuner方法： `BOFT`, `Vera` 和 `Pissa`. 使用 `--sft_type boft/vera` 开启BOFT或者Vera, 使用 `--init_lora_weights pissa` 以及 `--sft_type lora` 来使用 Pissa.
@@ -200,9 +204,12 @@ docker pull registry.us-west-1.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.
 
 ### Web-UI
 
+Web-UI是基于gradio界面技术的**零门槛**训练部署界面方案。Web-UI配置简单，且完美支持多卡训练和部署：
+
 ```shell
 swift web-ui
 ```
+![image.png](./docs/resources/web-ui.png)
 
 ### 训练
 
@@ -481,7 +488,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
 | Mistral<br>Mixtral                                 | [Mistral系列模型](https://github.com/mistralai/mistral-src)  | 英文       | 7B-8x22B | base模型<br>instruct模型<br>MoE模型             |
 | Yi<br>Yi1.5                                    | [01AI的YI系列模型](https://github.com/01-ai)                 | 中文<br>英文 | 6B-34B<br>包含量化版本          | base模型<br>chat模型<br>长文本模型                 |
 | InternLM<br>InternLM2<br>InternLM2-Math                   | [浦江实验室书生浦语系列模型](https://github.com/InternLM/InternLM) | 中文<br>英文 | 1.8B-20B                  | base模型<br>chat模型<br>数学模型                  |
-| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math               | [幻方系列模型](https://github.com/deepseek-ai)               | 中文<br>英文 | 1.3B-236B                  | base模型<br>chat模型<br>MoE模型<br>代码模型<br>数学模型 |
+| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math<br>DeepSeek-V2       | [幻方系列模型](https://github.com/deepseek-ai)               | 中文<br>英文 | 1.3B-236B                  | base模型<br>chat模型<br>MoE模型<br>代码模型<br>数学模型 |
 | MAMBA                                               | [MAMBA时序卷积模型](https://github.com/state-spaces/mamba)   | 英文       | 130M-2.8B                 | base模型                                    |
 | Gemma                                               | [Google Gemma系列模型](https://github.com/google/gemma_pytorch) | 英文       | 2B-7B                     | base模型<br>instruct模型                      |
 | MiniCPM                                             | [OpenBmB MiniCPM系列模型](https://github.com/OpenBMB/MiniCPM) | 中文<br>英文 | 2B-3B                     | chat模型<br>MoE模型                                    |
@@ -657,7 +664,7 @@ make docs
 }
 ```
 
-## ☎ 联系我们
+## ☎ 微信用户群
 
 您可以通过加我们的微信群, 来和我们联系和交流:
 
 
@@ -37,17 +37,17 @@
    - 对subsets更细粒度的控制: 默认使用注册时指定的subsets(注册时未指定则使用'default'). e.g. 'sharegpt-gpt4'. 如果指定subsets则使用对应子集的数据集. e.g. 'sharegpt-gpt4:default/V3_format#2000'. 使用'/'进行分隔.
    - dataset_id的支持. e.g. 'AI-ModelScope/alpaca-gpt4-data-zh#2000', 'HF::llm-wizard/alpaca-gpt4-data-zh#2000', 'hurner/alpaca-gpt4-data-zh#2000', 'HF::shibing624/alpaca-zh#2000'. 如果dataset_id已经注册，则会使用注册时的预处理函数、subsets、split等. 否则使用`SmartPreprocessor`, 支持4种数据集格式, 并使用'default'的subsets, split设置为'train'. 支持的数据集格式可以查看[数据集的自定义与拓展文档](自定义与拓展.md#自定义数据集).
    - dataset_path的支持. e.g. '1.jsonl#5000'. (如果是相对路径，则为相对于运行目录的相对路径).
-- `--val_dataset`: 用于指定单独的验证集, 格式和`dataset`参数相同, 如果使用本参数, 则`dataset_test_ratio`不再生效.
+- `--val_dataset`: 用于指定单独的验证集, 格式和`dataset`参数相同, 默认为`[]`. 如果使用本参数, 则`dataset_test_ratio`不再生效.
 - `--dataset_seed`: 用于指定数据集处理的seed, 默认为`42`. 以random_state形式存在, 不影响全局seed.
-- `--dataset_test_ratio`: 用于指定子数据集切分成训练集和验证集的比例, 默认为`0.01`.
+- `--dataset_test_ratio`: 用于指定子数据集切分成训练集和验证集的比例, 默认为`0.01`. 若设置了`--val_dataset`, 则该参数失效.
 - `--train_dataset_sample`: 对训练集的采样数, 默认是`-1`, 即使用完整的训练集进行训练. 该参数已废弃, 请使用`--dataset {dataset_name}#{dataset_sample}`
-- `--val_dataset_sample`: 对验证集进行采样, 默认是`None`, 自动选取合适数量的数据集数量进行验证. 如果你指定为`-1`, 则使用完整的验证集进行验证. 该参数已废弃, 验证集数量完全由`dataset_test_ratio`控制.
+- `--val_dataset_sample`: 对验证集进行采样, 默认是`None`, 自动选取合适数量的数据集数量进行验证. 如果你指定为`-1`, 则使用完整的验证集进行验证. 该参数已废弃, 验证集数量由`--dataset_test_ratio`或者`--val_dataset {dataset_name}#{dataset_sample}`控制.
 - `--system`: 对话模板中使用的system, 默认为`None`, 即使用模型默认的system. 如果指定为'', 则不使用system.
 - `--max_length`: token的最大长度, 默认为`2048`. 可以避免个别过长的数据样本造成OOM的问题. 当指定`--truncation_strategy delete`时, 如果某数据样本长度超过max_length, 我们会删除该数据样本. 如果指定`--truncation_strategy truncation_left`时, 我们会切除最前面的token: `input_ids[-max_length:]`. 如果设置为-1, 则无限制.
 - `--truncation_strategy`: 默认是`'delete'`表示把超过max_length的句子从数据集中删除. `'truncation_left'`表示会将超过文本的左边给切除掉, 这可能会切到special token, 会影响性能, 并不推荐.
 - `--check_dataset_strategy`: 默认值为`'none'`, 即不做检查. 如果你训练的模型是LLM, 则推荐使用`'warning'`作为数据检查的策略. 如果你的训练目标为句子分类等任务, 则建议设置为'`none`'.
 - `--custom_train_dataset_path`: 默认值为`[]`. 该参数已废弃, 请使用`--dataset {dataset_path}`.
-- `--custom_val_dataset_path`: 默认值为`[]`. 该参数已废弃, 不再区分训练集和验证集, 使用`dataset_test_ratio`统一进行切分. 请使用`--dataset {dataset_path}`.
+- `--custom_val_dataset_path`: 默认值为`[]`. 该参数已废弃, 该参数已废弃. 请使用`--val_dataset {dataset_path}`.
 - `--self_cognition_sample`: 自我认知数据集的采样数. 默认为`0`. 你该值设置为>0时, 需要同时指定`--model_name`, `--model_author`. 该参数已废弃, 请使用`--dataset self-cognition#{self_cognition_sample}`.
 - `--model_name`: 默认为`[None, None]`. 如果开启了自我认知数据集的采样(即指定`--dataset self-cognition`或者self_cognition_sample>0), 你需要传入两个值, 分别代表模型的中文名和英文名. 例如: `--model_name 小黄 'Xiao Huang'`. 如果你想了解更多, 可以查看[自我认知微调最佳实践](自我认知微调最佳实践.md).
 - `--model_author`: 默认为`[None, None]`. 如果开启了自我认知数据集的采样, 你需要传入两个值, 分别代表作者的中文名和英文名. 例如: `--model_author 魔搭 ModelScope`.
@@ -240,15 +240,16 @@ dpo参数继承了sft参数, 除此之外增加了以下参数:
 - `--seed`: 默认值为`42`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--dtype`: 默认值为`'AUTO`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--dataset`: 默认值为`[]`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
+- `--val_dataset`: 默认为`[]`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--dataset_seed`: 默认值为`42`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
-- `--dataset_test_ratio`: 默认值为`None`, 如果`--load_dataset_config true`则使用训练时的dataset_test_ratio, 否则设置为1. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
+- `--dataset_test_ratio`: 默认值为`0.01`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--show_dataset_sample`: 表示想要评估和展示的验证集的数量, 默认值为`10`.
 - `--system`: 默认值为`None`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--max_length`: 默认值为`-1`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--truncation_strategy`: 默认是`'delete'`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--check_dataset_strategy`: 默认值为`'none'`, 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--custom_train_dataset_path`: 默认值为`[]`. 该参数已废弃, 请使用`--dataset {dataset_path}`.
-- `--custom_val_dataset_path`: 默认值为`[]`. 该参数已废弃, 不再区分训练集和验证集, 使用`dataset_test_ratio`统一进行切分. 请使用`--dataset {dataset_path}`.
+- `--custom_val_dataset_path`: 默认值为`[]`. 该参数已废弃. 请使用`--val_dataset {dataset_path}`.
 - `--quantization_bit`: 默认值为0. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--quant_method`: 量化方法, 默认为`None`. 你可以选择为'bnb', 'hqq', 'eetq'.
 - `--hqq_axis`: hqq量化参数，表示执行分组的所沿的轴，默认为`0`, 可选值包括`0`,`1`
 
@@ -194,6 +194,8 @@
 |deepseek-vl-1_3b-chat|[deepseek-ai/deepseek-vl-1.3b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-vl-1.3b-chat/summary)|q_proj, k_proj, v_proj|deepseek-vl|&#x2714;|&#x2718;|attrdict|multi-modal, vision|[deepseek-ai/deepseek-vl-1.3b-chat](https://huggingface.co/deepseek-ai/deepseek-vl-1.3b-chat)|
 |deepseek-vl-7b-chat|[deepseek-ai/deepseek-vl-7b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-vl-7b-chat/summary)|q_proj, k_proj, v_proj|deepseek-vl|&#x2714;|&#x2718;|attrdict|multi-modal, vision|[deepseek-ai/deepseek-vl-7b-chat](https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat)|
 |deepseek-v2-chat|[deepseek-ai/DeepSeek-V2-Chat](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Chat/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|deepseek2|&#x2714;|&#x2714;|transformers>=4.39.3|-|[deepseek-ai/DeepSeek-V2-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat)|
+|deepseek-v2-lite|[deepseek-ai/DeepSeek-V2-Lite](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.39.3|-|[deepseek-ai/DeepSeek-V2-Lite](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite)|
+|deepseek-v2-lite-chat|[deepseek-ai/DeepSeek-V2-Lite-Chat](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite-Chat/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|deepseek2|&#x2714;|&#x2714;|transformers>=4.39.3|-|[deepseek-ai/DeepSeek-V2-Lite-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat)|
 |gemma-2b|[AI-ModelScope/gemma-2b](https://modelscope.cn/models/AI-ModelScope/gemma-2b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.38|-|[google/gemma-2b](https://huggingface.co/google/gemma-2b)|
 |gemma-7b|[AI-ModelScope/gemma-7b](https://modelscope.cn/models/AI-ModelScope/gemma-7b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.38|-|[google/gemma-7b](https://huggingface.co/google/gemma-7b)|
 |gemma-2b-instruct|[AI-ModelScope/gemma-2b-it](https://modelscope.cn/models/AI-ModelScope/gemma-2b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.38|-|[google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it)|
@@ -284,6 +286,8 @@
 |mamba-2.8b|[AI-ModelScope/mamba-2.8b-hf](https://modelscope.cn/models/AI-ModelScope/mamba-2.8b-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|&#x2718;|&#x2718;|transformers>=4.39.0|-|[state-spaces/mamba-2.8b-hf](https://huggingface.co/state-spaces/mamba-2.8b-hf)|
 |telechat-7b|[TeleAI/TeleChat-7B](https://modelscope.cn/models/TeleAI/TeleChat-7B/summary)|key_value, query|telechat|&#x2714;|&#x2718;||-|[Tele-AI/telechat-7B](https://huggingface.co/Tele-AI/telechat-7B)|
 |telechat-12b|[TeleAI/TeleChat-12B](https://modelscope.cn/models/TeleAI/TeleChat-12B/summary)|key_value, query|telechat|&#x2714;|&#x2718;||-|[Tele-AI/TeleChat-12B](https://huggingface.co/Tele-AI/TeleChat-12B)|
+|telechat-12b-v2|[TeleAI/TeleChat-12B-v2](https://modelscope.cn/models/TeleAI/TeleChat-12B-v2/summary)|key_value, query|telechat-v2|&#x2714;|&#x2718;||-|[Tele-AI/TeleChat-12B-v2](https://huggingface.co/Tele-AI/TeleChat-12B-v2)|
+|telechat-12b-v2-gptq-int4|[swift/TeleChat-12B-V2-GPTQ-Int4](https://modelscope.cn/models/swift/TeleChat-12B-V2-GPTQ-Int4/summary)|key_value, query|telechat-v2|&#x2714;|&#x2718;|auto_gptq>=0.5|-|-|
 |grok-1|[colossalai/grok-1-pytorch](https://modelscope.cn/models/colossalai/grok-1-pytorch/summary)|q_proj, k_proj, v_proj|default-generation|&#x2718;|&#x2718;||-|[hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1)|
 |dbrx-instruct|[AI-ModelScope/dbrx-instruct](https://modelscope.cn/models/AI-ModelScope/dbrx-instruct/summary)|attn.Wqkv|dbrx|&#x2714;|&#x2714;|transformers>=4.36|-|[databricks/dbrx-instruct](https://huggingface.co/databricks/dbrx-instruct)|
 |dbrx-base|[AI-ModelScope/dbrx-base](https://modelscope.cn/models/AI-ModelScope/dbrx-base/summary)|attn.Wqkv|dbrx|&#x2714;|&#x2714;|transformers>=4.36|-|[databricks/dbrx-base](https://huggingface.co/databricks/dbrx-base)|