Skip to content

v3.9.0

Latest
Compare
Choose a tag to compare
@Jintao-Huang Jintao-Huang released this 13 Oct 17:50
· 13 commits to main since this release

中文版

新特性

  1. Megatron-SWIFT
    a. 支持更多模型架构:Qwen3-VL, Qwen3-Omni, Qwen3-Next, Kimi-VL, InternVL3.5-HF等。完整的模型支持情况,参考支持的模型文档:https://swift.readthedocs.io/en/latest/Instruction/Supported-models-and-datasets.html
    b. 支持KTO训练,包括全参数/LoRA/MoE/多模态/Packing等训练技术等支持。感谢招商银行技术团队@kevssim 的贡献。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/rlhf/kto
    c. 支持RM训练,包括全参数/LoRA/MoE/多模态/Packing等训练技术等支持。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/rlhf/rm
    d. 支持序列分类模型架构,包括三种任务:regression、single_label_classification、multi_label_classification。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/seq_cls
    e. 支持VPP并行技术,减少PP并行的计算空泡,提高GPU利用率,但会略微提高通信量。支持异构PP并行 pipeline_model_parallel_layout,自定义流水线并行(PP/VPP)布局。
    f. DPO等RLHF技术中的ref_model不初始化 main_grad 降低显存占用。
  2. 训练
    a. 序列并行优化,ulysses 和 ring-attention 支持混合使用,实现更长的序列处理能力。支持纯文本和多模态模型的SFT/DPO/GRPO训练。训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/train/sequence_parallel/sequence_parallel.sh
    b. 纯文本及多模态模型Embedding/Reranker/序列分类任务训练支持使用 padding_free 以节约显存资源并加速训练。
    c. Embedding和Reranker训练数据集格式重构,具体参考文档:https://swift.readthedocs.io/en/latest/BestPractices/Embedding.html, https://swift.readthedocs.io/en/latest/BestPractices/Reranker.html
    d. Agent template支持更多模型:deepseek_v3_1, qwen3_coder。(感谢@gakkiri ,@ray075hl 的贡献)
    e. load_from_cache_file 默认值从True改成False,避免因缓存原因导致的未知问题。
  3. RLHF
    a. GRPO支持CHORD算法,在GRPO训练中混合SFT训练,参考文档:https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/CHORD.html
    b. KTO支持padding free和packing以节约显存资源并加速训练。
    c. GRPO训练 padding_free重构,更好支持多模态模型。
    d. GRPO vLLM 支持PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"环境变量,减小显存碎片。
  4. 推理
    a. 支持Reranker任务的推理/部署 (pt/vllm),以及序列分类任务的推理部署(pt/vllm)。脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/deploy/reranker, https://github.com/modelscope/ms-swift/tree/main/examples/deploy/seq_cls

新模型

  1. 纯文本模型
    a. Qwen/Qwen3-Next-80B-A3B-Instruct系列,训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_next
    b. ZhipuAI/GLM-4.6
    c. inclusionAI/Ling-mini-2.0; inclusionAI/Ring-mini-2.0系列
    d. iic/Tongyi-DeepResearch-30B-A3B
    e. ByteDance-Seed/Seed-OSS-36B-Instruct系列(感谢@hpsun1109 的贡献)
    f. deepseek-ai/DeepSeek-V3.1-Terminus
    g. PaddlePaddle/ERNIE-4.5-21B-A3B-Thinking
    h. google/embeddinggemma-300m(embedding模型)
  2. 多模态模型
    a. Qwen/Qwen3-VL-30B-A3B-Instruct系列,训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
    b. Qwen/Qwen3-Omni-30B-A3B-Instruct系列,训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_omni
    c. Kwai-Keye/Keye-VL-1_5-8B(感谢@hellopahe 的贡献)
    d. OpenGVLab/InternVL3_5-1B-HF系列
    e. BytedanceDouyinContent/SAIL-VL2-2B系列
    f. stepfun-ai/Step-Audio-2-mini(感谢@CJack812 的贡献)

English Version

New Features

  1. Megatron-SWIFT
    a. More model architecture support: Qwen3-VL, Qwen3-Omni, Qwen3-Next, Kimi-VL, InternVL3.5-HF, etc. For a complete list of supported models, please refer to the Supported Models documentation: https://swift.readthedocs.io/en/latest/Instruction/Supported-models-and-datasets.html
    b. KTO training support, including full-parameter, LoRA, MoE, multimodal, and Packing training techniques. Special thanks to @kevssim from China Merchants Bank’s technical team for their contribution. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/rlhf/kto
    c. Reward Model training support, including full-parameter, LoRA, MoE, multimodal, and Packing training techniques. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/rlhf/rm
    d. Sequence classification model architecture support, covering three task types: regression, single_label_classification, and multi_label_classification. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/seq_cls
    e. Support for VPP (Virtual Pipeline Parallelism): reduces pipeline bubbles in PP (Pipeline Parallelism), improving GPU utilization at the cost of slightly increased communication overhead. Supports heterogeneous PP via pipeline_model_parallel_layout for custom PP/VPP pipeline layouts.
    f. In RLHF techniques such as DPO, the ref_model no longer initializes main_grad, reducing GPU memory consumption.
  2. Training
    a. Sequence parallelism optimization: Ulysses and Ring Attention can now be used together, enabling processing of even longer sequences. Supports SFT/DPO/GRPO training for both text-only and multimodal models. Training script: https://github.com/modelscope/ms-swift/blob/main/examples/train/sequence_parallel/sequence_parallel.sh
    b. Padding-free training is now supported for embedding, reranker, and sequence classification tasks on both text-only and multimodal models, saving GPU memory and accelerating training.
    c. Restructured dataset formats for embedding and reranker training. For details, refer to the documentation: https://swift.readthedocs.io/en/latest/BestPractices/Embedding.html, https://swift.readthedocs.io/en/latest/BestPractices/Reranker.html
    d. Agent templates support more models: deepseek_v3_1, qwen3_coder. (Thanks to contributions from @gakkiri and @ray075hl)
    e. Default value of load_from_cache_file changed from True to False to avoid unexpected issues caused by caching.
  3. RLHF
    a. GRPO now supports the CHORD algorithm, enabling mixed SFT training during GRPO. Documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/CHORD.html
    b. KTO supports padding-free and packing, reducing memory usage and accelerating training.
    c. Padding-free implementation in GRPO has been refactored for better multimodal model support.
    d. GRPO with vLLM now supports the environment variable PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" to reduce GPU memory fragmentation.
  4. Inference
    a. Inference and deployment support for Reranker tasks (PyTorch/vLLM) and sequence classification tasks (PyTorch/vLLM). Example scripts: https://github.com/modelscope/ms-swift/tree/main/examples/deploy/reranker, https://github.com/modelscope/ms-swift/tree/main/examples/deploy/seq_cls

New Models

New Models

  1. Text-only Models
    a. Qwen/Qwen3-Next-80B-A3B-Instruct series. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_next
    b. ZhipuAI/GLM-4.6
    c. inclusionAI/Ling-mini-2.0; inclusionAI/Ring-mini-2.0 series
    d. iic/Tongyi-DeepResearch-30B-A3B
    e. ByteDance-Seed/Seed-OSS-36B-Instruct series (Thanks to @hpsun1109 for the contribution)
    f. deepseek-ai/DeepSeek-V3.1-Terminus
    g. PaddlePaddle/ERNIE-4.5-21B-A3B-Thinking
    h. google/embeddinggemma-300m (embedding model)
  2. Multimodal Models
    a. Qwen/Qwen3-VL-30B-A3B-Instruct series. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
    b. Qwen/Qwen3-Omni-30B-A3B-Instruct series. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_omni
    c. Kwai-Keye/Keye-VL-1_5-8B (Thanks to @hellopahe for the contribution)
    d. OpenGVLab/InternVL3_5-1B-HF series
    e. BytedanceDouyinContent/SAIL-VL2-2B series
    f. stepfun-ai/Step-Audio-2-mini (Thanks to @CJack812 for the contribution)

What's Changed

New Contributors

Full Changelog: v3.8.0...v3.9.0