Skip to content

Releases: modelscope/ms-swift

v3.4.0

30 Apr 15:45
Compare
Choose a tag to compare

中文版

新特性

  1. 支持Qwen3/Qwen2-MoE/Qwen3-MoE的Megatron训练(CPT/SFT),在MoE模型上相比transformers实现训练速度快近10倍。Qwen3-MoE训练最佳实践参考: #4030

新模型

  1. Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B系列
  2. Qwen/Qwen2.5-Omni-3B

English Version

New Features

  1. Support for Megatron training (CPT/SFT) of Qwen3/Qwen2-MoE/Qwen3-MoE, with training speeds nearly 10 times faster on MoE models compared to the Transformers implementation. For best practices on Qwen3-MoE training, refer to: #4030

New Models

  1. Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B series
  2. Qwen/Qwen2.5-Omni-3B

What's Changed

New Contributors

Full Changelog: v3.3.1...v3.4.0

v3.3.1

26 Apr 08:57
Compare
Choose a tag to compare

中文版

新特性

  1. Agent训练部署模块引入agent template,包括hermes, glm4_0414, llama4等10余种agent template,支持agent数据集兼容不同模型的训练切换,文档参考这里
  2. GRPO训练支持调用外部vLLM server,训练与部署显存分配更灵活,训练脚本参考这里

新模型

  1. OpenGVLab/InternVL3-1B系列
  2. moonshotai/Kimi-VL-A3B-Instruct系列
  3. ZhipuAI/GLM-4-9B-0414, ZhipuAI/GLM-Z1-9B-0414系列

English Version

New Features

  1. The Agent training and deployment module introduces agent templates, including more than 10 types such as hermes, glm4_0414, and llama4. These templates support switching between different models for agent dataset compatibility during training. For documentation, refer to here.
  2. GRPO training now supports calling an external vLLM server, allowing for more flexible allocation of GPU memory during training and deployment. For the training script, refer to here.

New Models

  1. OpenGVLab/InternVL3-1B series
  2. moonshotai/Kimi-VL-A3B-Instruct series
  3. ZhipuAI/GLM-4-9B-0414, ZhipuAI/GLM-Z1-9B-0414 series

What's Changed

New Contributors

Full Changelog: v3.3.0...v3.3.1

v3.3.0.post1

15 Apr 09:43
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v3.3.0...v3.3.0.post1

v3.3.0

11 Apr 06:36
Compare
Choose a tag to compare

中文版

新特性

  1. 支持DAPO算法,训练文档参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO.html#dapo
  2. 支持多模态模型的序列packing,包括qwen2-vl、qwen2.5-vl、qwen2.5-omni和internvl2.5系列,训练速度提升100%。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/packing
  3. 新增SWIFT和Megatron-SWIFT镜像,参考这里:https://swift.readthedocs.io/zh-cn/latest/GetStarted/SWIFT%E5%AE%89%E8%A3%85.html#id3
  4. 多模态/Omni/Moe量化能力增强,量化脚本参考这里:https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize

新模型

  1. Qwen/Qwen2.5-Omni-7B
  2. LLM-Research/Llama-4-Scout-17B-16E-Instruct系列
  3. cognitivecomputations/DeepSeek-V3-0324-AWQ

English Version

New Features

  1. Supports the DAPO algorithm; training documentation can be found here: https://swift.readthedocs.io/en/latest/Instruction/GRPO.html#dapo
  2. Supports sequence packing for multimodal models, including qwen2-vl, qwen2.5-vl, qwen2.5-omni, and the internvl2.5 series, with a 100% increase in training speed. Training scripts can be found here: https://github.com/modelscope/ms-swift/tree/main/examples/train/packing
  3. Added SWIFT and Megatron-SWIFT mirrors, see details here: https://swift.readthedocs.io/en/latest/GetStarted/SWIFT-installation.html#mirror
  4. Enhanced quantization capabilities for Multimodal/Omni/Moe models, shell scripts can be found here: https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize

New Models

  1. Qwen/Qwen2.5-Omni-7B
  2. LLM-Research/Llama-4-Scout-17B-16E-Instruct series
  3. cognitivecomputations/DeepSeek-V3-0324-AWQ

What's Changed

New Contributors

Full Changelog: v3.2.2...v3.3.0

v3.2.2

26 Mar 02:59
Compare
Choose a tag to compare

中文版

新特性

  1. Megatron-SWIFT发布。支持TP、PP、SP、CP等并行技术对Qwen系、Llama系、Deepseek-R1蒸馏系等100+模型进行预训练和微调。支持streaming数据集和序列packing功能支持超大数据集并提升训练效率。更多内容参考Megatron-SWIFT训练文档
  2. 支持多轮GRPO训练以适配例如Deep Search等多轮agent工具调用场景,示例代码参考这里
  3. 支持iic/gme-Qwen2-VL-2B-Instruct等多模态模型的Embedding训练。具体参考embedding模型训练文档
  4. 支持大模型和多模态大模型的多标签分类和回归任务的训练到部署。示例脚本参考这里
  5. 支持在训练过程中使用EvalScope对模型进行评测,及时了解模型的训练效果。示例脚本参考评测文档
  6. 书写外置plugin,以支持多模态模型LoRA训练LLM的同时,全参数训练ViT,并采用不同的学习率。避免ViT部分merge-lora造成的精度误差。示例脚本参考这里

新模型

  1. iic/gme-Qwen2-VL-2B-Instruct系列
  2. Qwen/Qwen2.5-VL-32B-Instruct
  3. LLM-Research/gemma-3-4b-it系列
  4. deepseek-ai/DeepSeek-V3-0324
  5. mistralai/Mistral-Small-3.1-24B-Instruct-2503系列

English Version

New Features

  1. Release of Megatron-SWIFT: Megatron-SWIFT has been released, supporting various parallel technologies such as TP (Tensor Parallelism), PP (Pipeline Parallelism), SP (Sequence Parallelism), and CP (Context Parallelism) for pre-training and fine-tuning over 100 models, including the Qwen series, Llama series, and Deepseek-R1 distillation series. It also supports streaming datasets and sequence packing, enabling the handling of ultra-large datasets while improving training efficiency. For more details, refer to the Megatron-SWIFT Training Documentation.
  2. Support for Multi-turn GRPO Training: Supports multi-turn GRPO training to adapt to scenarios such as multi-turn agent tool calls in Deep Search. Example code can be found here.
  3. Embedding Training for Multimodal Models: Supports embedding training for multimodal models such as iic/gme-Qwen2-VL-2B-Instruct. For more information, refer to the Embedding Model Training Documentation.
  4. Multi-label Classification and Regression Tasks for Large Models and Multimodal Large Models: Supports end-to-end training and deployment for multi-label classification and regression tasks for large models and multimodal large models. Example scripts can be found here.
  5. Model Evaluation with EvalScope During Training: Supports model evaluation using EvalScope during training to monitor training performance in real time. Example scripts can be found in the Evaluation Documentation.
  6. Custom External Plugin for LoRA + ViT Training: Provides an external plugin to support LoRA training for LLMs (Large Language Models) while performing full-parameter training for ViTs (Vision Transformers) with different learning rates. This avoids precision errors caused by merging LoRA into the ViT portion. Example code can be found here.

New Models

  1. iic/gme-Qwen2-VL-2B-Instruct series
  2. Qwen/Qwen2.5-VL-32B-Instruct
  3. LLM-Research/gemma-3-4b-it series
  4. deepseek-ai/DeepSeek-V3-0324
  5. mistralai/Mistral-Small-3.1-24B-Instruct-2503 series

What's Changed

Read more

v3.2.1

14 Mar 07:07
Compare
Choose a tag to compare

中文版

新特性

  1. GRPO支持vLLM的tensor parallel模式。例子参考这里
  2. GRPO支持co-locate和optimizer和model的offload,支持分批次导入权重和合并LoRA,节约显存资源,使72B模型的训练可以在四张A100上运行。例子参考这里
  3. GRPO支持code ORM。最佳实践参考这里

新模型

  1. Qwen/QwQ-32B系列
  2. inclusionAI/Ling-lite系列

New Features

  1. GRPO supports the tensor parallel mode of vLLM. Examples can be found here.
  2. GRPO supports co-locating offloading for both the optimizer and the model, allows for batch weight loading and LoRA merging, saving GPU memory resources, which enables training of a 72B model on four A100 GPUs. Examples can be found here.
  3. GRPO supports code ORM. Best practices can be found here.

New Models

  1. Qwen/QwQ-32B series
  2. inclusionAI/Ling-lite series

What's Changed

New Contributors

Full Changelog: v3.2.0...v3.2.1

v3.2.0

04 Mar 15:48
Compare
Choose a tag to compare

中文版

新特性

  1. GRPO支持多vLLM/lmdeploy数据并行采样,支持异步采样,参考这里。多模态GRPO实验记录参考这里
  2. swift deploy infer_backend为pt时支持动态batch;流式推理接口修改(break change)。
  3. swift infer infer_backend为vllm/lmdeploy支持数据并行。参考这里
  4. 支持moun优化器,参考这里

新模型

  1. moonshotai/Moonlight-16B-A3B-Instruct
  2. LLM-Research/Phi-4-mini-instruct, LLM-Research/Phi-4-multimodal-instruct
  3. DeepSeek-V3-awq, deepseek-r1-awq
  4. Baichuan-M1-14B-Instruct

新数据集

  1. 多模态GRPO:
    • lmms-lab/multimodal-open-r1-8k-verified
    • okwinds/clevr_cogen_a_train

New Features

  1. GRPO supports multi-vLLM/lmdeploy data parallel sampling and asynchronous sampling. For more information, refer to here. Records of multi-modal GRPO experiments can be found here.
  2. When swift deploy infer_backend is set to pt, it supports dynamic batching; the streaming inference interface has been modified (breaking change).
  3. When swift infer infer_backend is set to vllm/lmdeploy, it supports data parallelism. Refer to here.
  4. Supports the muon optimizer. For more information, refer to here.

New Models

  1. moonshotai/Moonlight-16B-A3B-Instruct
  2. LLM-Research/Phi-4-mini-instruct, LLM-Research/Phi-4-multimodal-instruct
  3. DeepSeek-V3-awq, deepseek-r1-awq
  4. Baichuan-M1-14B-Instruct

New Datasets

  1. Multi-modal GRPO:
    • lmms-lab/multimodal-open-r1-8k-verified
    • okwinds/clevr_cogen_a_train

What's Changed

New Contributors

Full Changelog: v3.1.1...v3.2.0

v3.1.1

20 Feb 06:31
Compare
Choose a tag to compare

中文版

新特性

  1. 支持大模型、多模态模型、Agent、多节点GRPO训练,参考这里
  2. 支持Embeding模型训练,参考这里
  3. swift sample支持MCTS、蒸馏方式数据采样,支持多模态模型采样。
  4. 支持自定义数据集评测,参考这里

新模型

  1. AIDC-AI/Ovis2-2B系列
  2. Qwen/Qwen2.5-VL-72B-Instruct-AWQ系列
  3. stepfun-ai/GOT-OCR-2.0-hf
  4. stepfun-ai/Step-Audio-Chat
  5. mistralai/Mistral-Small-24B-Instruct-2501

新数据集

  1. GRPO相关
    • AI-ModelScope/MATH-lighteval
    • LLM-Research/xlam-function-calling-60k
    • AI-MO/NuminaMath-TIR
  2. R1相关
    • liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT
    • modelscope/MathR, modelscope/MathR-32B-Distill

New Features

  1. Support for large models, multimodal models, Agents, and multi-node GRPO training. Refer to this documentation.
  2. Support for Embedding model training. Refer to this script.
  3. swift sample supports MCTS and distillation data sampling, as well as multimodal model sampling.
  4. Support for custom dataset evaluation. Refer to this documentation.

New Models

  1. AIDC-AI/Ovis2-2B series
  2. Qwen/Qwen2.5-VL-72B-Instruct-AWQ series
  3. stepfun-ai/GOT-OCR-2.0-hf
  4. stepfun-ai/Step-Audio-Chat
  5. mistralai/Mistral-Small-24B-Instruct-2501

New Datasets

  1. Related to GRPO
    • AI-ModelScope/MATH-lighteval
    • LLM-Research/xlam-function-calling-60k
    • AI-MO/NuminaMath-TIR
  2. Related to R1
    • liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT
    • modelscope/MathR, modelscope/MathR-32B-Distill

What's Changed

New Contributors

Full Changelog: v3.1.0...v3.1.1

v3.1.0

07 Feb 12:38
Compare
Choose a tag to compare

中文版

新特性

  1. 支持swift sample命令进行数据采样,参考这里
  2. 支持强化微调训练,目前已支持拒绝采样微调,参考这里
  3. Grounding任务自定义数据格式重构,参考这里
  4. swift infer支持输出推理速度和ACC/ROUGE/BLEU指标。

新模型

  1. Qwen/Qwen2.5-VL-3B-Instruct系列
  2. Qwen/Qwen2.5-7B-Instruct-1M系列
  3. deepseek-ai/Janus-Pro-1B系列
  4. bytedance-research/UI-TARS-2B-SFT系列

新数据集

  1. ServiceNow-AI/R1-Distill-SFT
  2. bespokelabs/Bespoke-Stratos-17k
  3. open-thoughts/OpenThoughts-114k

English Version

New Features

  1. Supports the swift sample command for data sampling; refer to here.
  2. Supports reinforcement fine-tuning training, with current support for rejection sampling fine-tuning; refer to here.
    3Grounding task custom data format restructuring; refer to here.
  3. swift infer supports outputting inference speed and ACC/ROUGE/BLEU metrics.

New Models

  1. Qwen/Qwen2.5-VL-3B-Instruct Series
  2. Qwen/Qwen2.5-7B-Instruct-1M Series
  3. deepseek-ai/Janus-Pro-1B Series
  4. bytedance-research/UI-TARS-2B-SFT Series

New Datasets

  1. ServiceNow-AI/R1-Distill-SFT
  2. bespokelabs/Bespoke-Stratos-17k
  3. open-thoughts/OpenThoughts-114k

What's Changed

New Contributors

Full Changelog: v3.0.3...v3.1.0

v3.0.3

22 Jan 15:27
Compare
Choose a tag to compare

中文版

新特性

  1. 支持多模态大模型SequenceClassification架构用于多模态分类任务,参考这里
  2. 支持多模态大模型reward model训练。

新模型

  1. Shanghai_AI_Laboratory/internlm3-8b-instruct
  2. OpenBMB/MiniCPM-o-2_6
  3. deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B系列
  4. bytedance-research/Valley-Eagle-7B
  5. LLM-Research/phi-4
  6. Qwen/Qwen2.5-Math-PRM-7B, Qwen/Qwen2.5-Math-PRM-72B
  7. MiniMaxAI/MiniMax-Text-01, MiniMaxAI/MiniMax-VL-01

English Version

New Features

  1. Support multi-modal large model SequenceClassification architecture for multi-modal classification tasks, see here.
  2. Support training of multi-modal reward model.

New Models

  1. Shanghai_AI_Laboratory/internlm3-8b-instruct
  2. OpenBMB/MiniCPM-o-2_6
  3. deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B series
  4. bytedance-research/Valley-Eagle-7B
  5. LLM-Research/phi-4
  6. Qwen/Qwen2.5-Math-PRM-7B, Qwen/Qwen2.5-Math-PRM-72B
  7. MiniMaxAI/MiniMax-Text-01, MiniMaxAI/MiniMax-VL-01

What's Changed

Full Changelog: v3.0.2...v3.0.3