Releases: modelscope/ms-swift
Releases · modelscope/ms-swift
v3.4.0
中文版
新特性
- 支持Qwen3/Qwen2-MoE/Qwen3-MoE的Megatron训练(CPT/SFT),在MoE模型上相比transformers实现训练速度快近10倍。Qwen3-MoE训练最佳实践参考: #4030
新模型
- Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B系列
- Qwen/Qwen2.5-Omni-3B
English Version
New Features
- Support for Megatron training (CPT/SFT) of Qwen3/Qwen2-MoE/Qwen3-MoE, with training speeds nearly 10 times faster on MoE models compared to the Transformers implementation. For best practices on Qwen3-MoE training, refer to: #4030
New Models
- Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B series
- Qwen/Qwen2.5-Omni-3B
What's Changed
- 🐛 fix: fix reward model train seq_cls by @gaohongkui in #3921
- Support vllm quantization by @tastelikefeet in #4003
- [megatron] Support Qwen3 by @Jintao-Huang in #3995
- Fix merge sentence transformers by @tastelikefeet in #4011
- Fix gte training and compatible with ds3 by @tastelikefeet in #4022
- fix truncation_strategy by @Jintao-Huang in #4025
- [Megatron] support MoE (Qwen2-Moe & Qwen3-MoE) by @Jintao-Huang in #4012
- Support Qwen3 series by @Jintao-Huang in #4029
- fix bugs by @Jintao-Huang in #4031
- fix grpo resume_from_checkpoint by @Jintao-Huang in #4035
- support qwen3_self_cognition by @Jintao-Huang in #4039
- Update readme & fix generate by @Jintao-Huang in #4041
- update wechat by @tastelikefeet in #4047
- support Qwen2.5-Omni-3B by @Jintao-Huang in #4052
- updates GRPOTrainer compatible with trl 0.17 by @hjh0119 in #3969
- fix rollout by @hjh0119 in #4055
New Contributors
- @gaohongkui made their first contribution in #3921
Full Changelog: v3.3.1...v3.4.0
v3.3.1
中文版
新特性
- Agent训练部署模块引入agent template,包括hermes, glm4_0414, llama4等10余种agent template,支持agent数据集兼容不同模型的训练切换,文档参考这里。
- GRPO训练支持调用外部vLLM server,训练与部署显存分配更灵活,训练脚本参考这里。
新模型
- OpenGVLab/InternVL3-1B系列
- moonshotai/Kimi-VL-A3B-Instruct系列
- ZhipuAI/GLM-4-9B-0414, ZhipuAI/GLM-Z1-9B-0414系列
English Version
New Features
- The Agent training and deployment module introduces agent templates, including more than 10 types such as hermes, glm4_0414, and llama4. These templates support switching between different models for agent dataset compatibility during training. For documentation, refer to here.
- GRPO training now supports calling an external vLLM server, allowing for more flexible allocation of GPU memory during training and deployment. For the training script, refer to here.
New Models
- OpenGVLab/InternVL3-1B series
- moonshotai/Kimi-VL-A3B-Instruct series
- ZhipuAI/GLM-4-9B-0414, ZhipuAI/GLM-Z1-9B-0414 series
What's Changed
- Fix sampling and rft by @tastelikefeet in #3847
- Fix incorrect retry count check in LazyLLMDataset.getitem by @IamLihua in #3845
- support internvl3 by @hjh0119 in #3842
- fix grpo filter overlong by @hjh0119 in #3844
- dapo-bug by @Evilxya in #3846
- support agent packing by @Jintao-Huang in #3853
- Fix internvl2.5/3 deepspeed packing by @Jintao-Huang in #3855
- fix multimodal target_modules by @Jintao-Huang in #3856
- Fix multimodal target modules by @Jintao-Huang in #3858
- Update FAQ by @slin000111 in #3841
- fix grpo completion length equal zero by @hjh0119 in #3857
- support val_dataset_shuffle by @Jintao-Huang in #3860
- Update swift docker by @Jintao-Huang in #3866
- fix citest & minimax link by @Jintao-Huang in #3868
- fix grpo save checkpoint by @hjh0119 in #3865
- support glm4-z1 by @hjh0119 in #3862
- add paper link by @tastelikefeet in #3886
- refactor mm target_regex (compat peft/vllm) by @Jintao-Huang in #3879
- Support kimi-vl by @Jintao-Huang in #3884
- Fix glm4 z1 by @Jintao-Huang in #3889
- fix bugs by @Jintao-Huang in #3893
- fix typealias to be compatible with Python 3.9 by @hjh0119 in #3895
- Fix ui by @tastelikefeet in #3903
- Fix fp16 bf16 by @Jintao-Huang in #3909
- add rm center_rewards_coefficient argument by @hjh0119 in #3917
- revert swift_from_pretrained by @Jintao-Huang in #3914
- fix grpo doc by @hjh0119 in #3920
- update qwen2_5_omni by @Jintao-Huang in #3908
- Support qwen3 by @Jintao-Huang in #3945
- Decouple vLLM engine and GRPOTrainer. by @hjh0119 in #3911
- Refactor Agent Template by @Jintao-Huang in #3918
- update docs by @Jintao-Huang in #3961
- fix bugs by @Jintao-Huang in #3962
- Support hermes loss_scale by @Jintao-Huang in #3963
- fix parse tools by @Jintao-Huang in #3975
- Update unsloth compatibility by @tastelikefeet in #3970
- Fix qwen2.5-omni use_audio_in_video by @Jintao-Huang in #3987
- Fix web-ui by @tastelikefeet in #3997
- fix get_toolcall & fix ci by @Jintao-Huang in #3999
- fix bugs by @Jintao-Huang in #4001
- fix seq_cls by @Jintao-Huang in #4002
New Contributors
Full Changelog: v3.3.0...v3.3.1
v3.3.0.post1
What's Changed
- Fix sampling and rft by @tastelikefeet in #3847
- Fix incorrect retry count check in LazyLLMDataset.getitem by @IamLihua in #3845
- support internvl3 by @hjh0119 in #3842
- fix grpo filter overlong by @hjh0119 in #3844
- dapo-bug by @Evilxya in #3846
- support agent packing by @Jintao-Huang in #3853
- Fix internvl2.5/3 deepspeed packing by @Jintao-Huang in #3855
- fix multimodal target_modules by @Jintao-Huang in #3856
- Fix multimodal target modules by @Jintao-Huang in #3858
- Update FAQ by @slin000111 in #3841
- fix grpo completion length equal zero by @hjh0119 in #3857
New Contributors
Full Changelog: v3.3.0...v3.3.0.post1
v3.3.0
中文版
新特性
- 支持DAPO算法,训练文档参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO.html#dapo
- 支持多模态模型的序列packing,包括qwen2-vl、qwen2.5-vl、qwen2.5-omni和internvl2.5系列,训练速度提升100%。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/packing
- 新增SWIFT和Megatron-SWIFT镜像,参考这里:https://swift.readthedocs.io/zh-cn/latest/GetStarted/SWIFT%E5%AE%89%E8%A3%85.html#id3
- 多模态/Omni/Moe量化能力增强,量化脚本参考这里:https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize
新模型
- Qwen/Qwen2.5-Omni-7B
- LLM-Research/Llama-4-Scout-17B-16E-Instruct系列
- cognitivecomputations/DeepSeek-V3-0324-AWQ
English Version
New Features
- Supports the DAPO algorithm; training documentation can be found here: https://swift.readthedocs.io/en/latest/Instruction/GRPO.html#dapo
- Supports sequence packing for multimodal models, including qwen2-vl, qwen2.5-vl, qwen2.5-omni, and the internvl2.5 series, with a 100% increase in training speed. Training scripts can be found here: https://github.com/modelscope/ms-swift/tree/main/examples/train/packing
- Added SWIFT and Megatron-SWIFT mirrors, see details here: https://swift.readthedocs.io/en/latest/GetStarted/SWIFT-installation.html#mirror
- Enhanced quantization capabilities for Multimodal/Omni/Moe models, shell scripts can be found here: https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize
New Models
- Qwen/Qwen2.5-Omni-7B
- LLM-Research/Llama-4-Scout-17B-16E-Instruct series
- cognitivecomputations/DeepSeek-V3-0324-AWQ
What's Changed
- fix shell by @Jintao-Huang in #3675
- support Qwen/Qwen2.5-Omni-7B (sft/dpo/grpo) by @Jintao-Huang in #3613
- fix grpo rank by @hjh0119 in #3687
- Grpo vl72b script by @hjh0119 in #3692
- fix import error by @Jintao-Huang in #3700
- [megatron] fix val_dataset streaming by @Jintao-Huang in #3699
- fix grpo qwen2_5_omni by @Jintao-Huang in #3701
- fix grpo vl by @Jintao-Huang in #3704
- update warning_once by @Jintao-Huang in #3706
- fix grpo template copy by @Jintao-Huang in #3708
- fix adalora by @tastelikefeet in #3714
- fix qwen2_5-omni by @Jintao-Huang in #3716
- Fix grpo dora by @hjh0119 in #3709
- support qwen2_5_vl packing by @Jintao-Huang in #3694
- fix qwen2_5 omni by @Jintao-Huang in #3734
- fix grpo train dataloader by @Jintao-Huang in #3736
- support internvl2.5 packing by @Jintao-Huang in #3735
- Support qwen2 5-vl awq quant & update shell by @Jintao-Huang in #3743
- support moe quant by @Jintao-Huang in #3772
- update liger kernel by @Jintao-Huang in #3775
- support llama4 by @Jintao-Huang in #3777
- support DAPO by @hjh0119 in #3725
- [Gemma] Fixing the ndarray cast warning by @Reichenbachian in #3791
- add swift docker by @Jintao-Huang in #3796
- support streaming shuffle by @Jintao-Huang in #3782
- grpo lmdeploy warn by @hjh0119 in #3800
- fix math accuracy by @hjh0119 in #3795
- fix grounding dataset concat by @Jintao-Huang in #3802
- fix omni max_model_len by @Jintao-Huang in #3803
- fix get_config_attrs by @Jintao-Huang in #3807
- Fix grpo ovis2 by @Jintao-Huang in #3808
- more grpo log by @hjh0119 in #3801
- fix reward_template by @Jintao-Huang in #3813
- [GRPO] fix template copy (async generate) by @Jintao-Huang in #3814
- update docs by @Jintao-Huang in #3815
- optimize zero3 rlhf code by @Jintao-Huang in #3816
- fix grpo zero3 inflight params by @hjh0119 in #3818
- fix grpo log_completions by @Jintao-Huang in #3819
- vLLM 0.8.3 support for GRPO colocate mode by @hjh0119 in #3820
- fix web-ui by @Jintao-Huang in #3822
- fix telechat by @hjh0119 in #3825
- fix omni zero3 by @Jintao-Huang in #3826
- feat: grpo async generate thread-safe queue production by @hjh0119 in #3821
- fix grpo async generate by @hjh0119 in #3829
- update docs grpo vllm by @Jintao-Huang in #3831
- support omni vllm by @Jintao-Huang in #3832
- remove sequence_parallel_size by @Jintao-Huang in #3835
- update grpo type annotations by @hjh0119 in #3834
- fix grpo multi turn tp by @hjh0119 in #3837
- [docs] fix seq_parallel by @Jintao-Huang in #3838
New Contributors
- @Reichenbachian made their first contribution in #3791
Full Changelog: v3.2.2...v3.3.0
v3.2.2
中文版
新特性
- Megatron-SWIFT发布。支持TP、PP、SP、CP等并行技术对Qwen系、Llama系、Deepseek-R1蒸馏系等100+模型进行预训练和微调。支持streaming数据集和序列packing功能支持超大数据集并提升训练效率。更多内容参考Megatron-SWIFT训练文档。
- 支持多轮GRPO训练以适配例如Deep Search等多轮agent工具调用场景,示例代码参考这里。
- 支持mini-batch,降低训练时的显存消耗。参考GRPO训练文档。
- 支持iic/gme-Qwen2-VL-2B-Instruct等多模态模型的Embedding训练。具体参考embedding模型训练文档。
- 支持大模型和多模态大模型的多标签分类和回归任务的训练到部署。示例脚本参考这里。
- 支持在训练过程中使用EvalScope对模型进行评测,及时了解模型的训练效果。示例脚本参考评测文档。
- 书写外置plugin,以支持多模态模型LoRA训练LLM的同时,全参数训练ViT,并采用不同的学习率。避免ViT部分merge-lora造成的精度误差。示例脚本参考这里。
新模型
- iic/gme-Qwen2-VL-2B-Instruct系列
- Qwen/Qwen2.5-VL-32B-Instruct
- LLM-Research/gemma-3-4b-it系列
- deepseek-ai/DeepSeek-V3-0324
- mistralai/Mistral-Small-3.1-24B-Instruct-2503系列
English Version
New Features
- Release of Megatron-SWIFT: Megatron-SWIFT has been released, supporting various parallel technologies such as TP (Tensor Parallelism), PP (Pipeline Parallelism), SP (Sequence Parallelism), and CP (Context Parallelism) for pre-training and fine-tuning over 100 models, including the Qwen series, Llama series, and Deepseek-R1 distillation series. It also supports streaming datasets and sequence packing, enabling the handling of ultra-large datasets while improving training efficiency. For more details, refer to the Megatron-SWIFT Training Documentation.
- Support for Multi-turn GRPO Training: Supports multi-turn GRPO training to adapt to scenarios such as multi-turn agent tool calls in Deep Search. Example code can be found here.
- Supports mini-batch training to reduce GPU memory consumption during training. Refer to the GRPO Training Documentation.
- Embedding Training for Multimodal Models: Supports embedding training for multimodal models such as iic/gme-Qwen2-VL-2B-Instruct. For more information, refer to the Embedding Model Training Documentation.
- Multi-label Classification and Regression Tasks for Large Models and Multimodal Large Models: Supports end-to-end training and deployment for multi-label classification and regression tasks for large models and multimodal large models. Example scripts can be found here.
- Model Evaluation with EvalScope During Training: Supports model evaluation using EvalScope during training to monitor training performance in real time. Example scripts can be found in the Evaluation Documentation.
- Custom External Plugin for LoRA + ViT Training: Provides an external plugin to support LoRA training for LLMs (Large Language Models) while performing full-parameter training for ViTs (Vision Transformers) with different learning rates. This avoids precision errors caused by merging LoRA into the ViT portion. Example code can be found here.
New Models
- iic/gme-Qwen2-VL-2B-Instruct series
- Qwen/Qwen2.5-VL-32B-Instruct
- LLM-Research/gemma-3-4b-it series
- deepseek-ai/DeepSeek-V3-0324
- mistralai/Mistral-Small-3.1-24B-Instruct-2503 series
What's Changed
- update code doc by @hjh0119 in #3498
- fix readme by @Jintao-Huang in #3499
- feat: swanlab config add ms-swift by @Zeyi-Lin in #3500
- Support GME models by @tastelikefeet in #3513
- fix docs by @tastelikefeet in #3514
- Fix docs links by @tastelikefeet in #3516
- fix vllm memory leak by @hjh0119 in #3515
- [Docs] Easy
.[all]
install from git by @xihuai18 in #3518 - Fix bugs by @tastelikefeet in #3520
- support megatron by @Jintao-Huang in #2885
- fix megatron by @Jintao-Huang in #3527
- support gemma3 by @hjh0119 in #3492
- fix megatron pipeline parallel by @Jintao-Huang in #3529
- fix megatron tie_weight by @Jintao-Huang in #3530
- support megatron llama by @Jintao-Huang in #3532
- Support megatron llama3.1 3.2 by @Jintao-Huang in #3537
- 更新LlavaHfTemplate以适配transformers版本大于4.47时对LLaVA和LLaVA-Next模型处理图像token逻辑的修改 by @zsxm1998 in #3521
- refactor llava-hf by @Jintao-Huang in #3538
- fix docs by @Jintao-Huang in #3539
- refactor get_megatron_model_meta by @Jintao-Huang in #3542
- Gather infonce loss and support hard negative samples by @tastelikefeet in #3548
- fix docs by @tastelikefeet in #3553
- fix unsloth by @tastelikefeet in #3554
- fix grpo mllm split modules by @hjh0119 in #3552
- grpo embedding layer lora by @hjh0119 in #3531
- update arguments by @Jintao-Huang in #3556
- update doc by @hjh0119 in #3557
- Support all models' embedding and mask fake negative by @tastelikefeet in #3563
- skip grpo first wake up by @hjh0119 in #3562
- move grpovllmengine import by @hjh0119 in #3568
- fix bugs & support dataset_name by @Jintao-Huang in #3565
- fix wrap by @tastelikefeet in #3572
- Feature: add train-eval loop by @Yunnglin in #3569
- compat vllm>=0.8 by @Jintao-Huang in #3574
- [grpo] Fix Incorrect Placement of Data in eval_queue During async_generate by @hjh0119 in #3573
- Fix lmdeploy 0.7.3 by @tastelikefeet in #3584
- support vit full llm lora by @Jintao-Huang in #3575
- support Mistral3.1-2503 by @hjh0119 in #3588
- Support megatron packing by @Jintao-Huang in #3595
- [megatron] support streaming by @Jintao-Huang in #3609
- fix rft by @lxline in #3602
- [template] refactor replace media tokens by @Jintao-Huang in #3614
- fix top_logprobs by @Jintao-Huang in #3616
- Fix bugs by @Jintao-Huang in #3619
- Support multi turn grpo by @tastelikefeet in #3615
- fix grpo npu context by @hjh0119 in #3597
- support regression multi-label by @Jintao-Huang in #3621
- Support peft 0.15 by @tastelikefeet in #3623
- update grpo warning by @hjh0119 in #3598
- fix grpo rm zero3 by @hjh0119 in #3626
- GRPO mini batch by @hjh0119 in #3205
- fix grpo warning with pt backend by @hjh0119 in #3629
- compat transformers 4.50 by @Jintao-Huang in #3625
- support train_sampler_random by @Jintao-Huang in #3631
- fix grpo multi turn by @tastelikefeet in #3632
- update docs by @Jintao-Huang in #3633
- Support deepseek v3 0324 by @Jintao-Huang in #3637
- fix grpo cosine reward by @hjh0119 in #3638
- fix grpo lora split module by @hjh0119 in #3635
- fix reward model by @Jintao-Huang in #3641
- support qwen2_5_vl_32b by @Jintao-Huang in #3642
- fix grpo warning by @hjh0119 in #3630
- grpo reset prefix cache by @...
v3.2.1
中文版
新特性
- GRPO支持vLLM的tensor parallel模式。例子参考这里。
- GRPO支持co-locate和optimizer和model的offload,支持分批次导入权重和合并LoRA,节约显存资源,使72B模型的训练可以在四张A100上运行。例子参考这里。
- GRPO支持code ORM。最佳实践参考这里。
新模型
- Qwen/QwQ-32B系列
- inclusionAI/Ling-lite系列
New Features
- GRPO supports the tensor parallel mode of vLLM. Examples can be found here.
- GRPO supports co-locating offloading for both the optimizer and the model, allows for batch weight loading and LoRA merging, saving GPU memory resources, which enables training of a 72B model on four A100 GPUs. Examples can be found here.
- GRPO supports code ORM. Best practices can be found here.
New Models
- Qwen/QwQ-32B series
- inclusionAI/Ling-lite series
What's Changed
- Support vllm LLMEngine by @Jintao-Huang in #3370
- update publish workflows by @Jintao-Huang in #3374
- support ling by @Jintao-Huang in #3379
- Support mp mode and hybrid mode of GRPO by @tastelikefeet in #3381
- fix name by @tastelikefeet in #3382
- fix web-ui infer by @Jintao-Huang in #3384
- fix bugs by @tastelikefeet in #3385
- fix bugs by @Jintao-Huang in #3386
- support Qwen/QwQ-32B by @Jintao-Huang in #3388
- support qwq-awq by @Jintao-Huang in #3391
- support lmdeploy qwen2_5_vl by @Jintao-Huang in #3394
- update infer_save by @Jintao-Huang in #3400
- update requirements by @Jintao-Huang in #3403
- fix ollama export by @Jintao-Huang in #3406
- Fix grpo engine by @tastelikefeet in #3412
- fix infer_stream by @Jintao-Huang in #3413
- FIx some comments, add dlc script by @tastelikefeet in #3419
- add comments and docs by @tastelikefeet in #3424
- fix issue 1663 by @Jintao-Huang in #3417
- Support GRPO model and optimizer offload, and split loading model by @tastelikefeet in #3427
- update wechat by @tastelikefeet in #3430
- Fix vllm random by @tastelikefeet in #3437
- fix seed by @Jintao-Huang in #3438
- fix_base_deploy by @Jintao-Huang in #3442
- fix GRPO device mismatch by @hjh0119 in #3440
- compat vllm==0.5.1 by @Jintao-Huang in #3444
- fix grpo multimodal doc by @mi804 in #3449
- support grpo code orm by @hjh0119 in #3431
- fix GRPO seed by @Jintao-Huang in #3458
- fix grpo multi nodes by @hjh0119 in #3462
- Fix tensor parallel hang by @tastelikefeet in #3464
- fix grpo trainer zero3 always gather parameters by @tcye in #3467
- fix grpo temperature inconsistency by @hjh0119 in #3468
- fix grad_norm nan by @Jintao-Huang in #3465
- fix grad_norm by @Jintao-Huang in #3469
- update minimax by @Jintao-Huang in #3471
- Support 72b script with 4 gpus by @tastelikefeet in #3472
- refactor packing by @Jintao-Huang in #3457
- Fix some docs by @tastelikefeet in #3475
- fix grpo ddp hang by @hjh0119 in #3476
- fix moe quant by @Jintao-Huang in #3478
- Delete duplicate parameters in train_72b_4gpu.sh by @Marquis03 in #3479
- fix image by @tastelikefeet in #3480
- fix infer gptq internvl2 by @Jintao-Huang in #3481
- Resume sample by @BC-A in #3460
- fix qwen2_vl flash_attn deepspeed by @Jintao-Huang in #3484
- Fix seed of tp=1 by @tastelikefeet in #3486
- fix use_cache by @Jintao-Huang in #3487
- Fix qwen2 5 vl grounding by @Jintao-Huang in #3491
- fix ovis2 device_map by @Jintao-Huang in #3496
- fix template.decode by @Jintao-Huang in #3497
New Contributors
- @tcye made their first contribution in #3467
- @Marquis03 made their first contribution in #3479
- @BC-A made their first contribution in #3460
Full Changelog: v3.2.0...v3.2.1
v3.2.0
中文版
新特性
- GRPO支持多vLLM/lmdeploy数据并行采样,支持异步采样,参考这里。多模态GRPO实验记录参考这里。
swift deploy
infer_backend为pt时支持动态batch;流式推理接口修改(break change)。swift infer
infer_backend为vllm/lmdeploy支持数据并行。参考这里。- 支持moun优化器,参考这里。
新模型
- moonshotai/Moonlight-16B-A3B-Instruct
- LLM-Research/Phi-4-mini-instruct, LLM-Research/Phi-4-multimodal-instruct
- DeepSeek-V3-awq, deepseek-r1-awq
- Baichuan-M1-14B-Instruct
新数据集
- 多模态GRPO:
- lmms-lab/multimodal-open-r1-8k-verified
- okwinds/clevr_cogen_a_train
New Features
- GRPO supports multi-vLLM/lmdeploy data parallel sampling and asynchronous sampling. For more information, refer to here. Records of multi-modal GRPO experiments can be found here.
- When
swift deploy
infer_backend is set to pt, it supports dynamic batching; the streaming inference interface has been modified (breaking change). - When
swift infer
infer_backend is set to vllm/lmdeploy, it supports data parallelism. Refer to here. - Supports the muon optimizer. For more information, refer to here.
New Models
- moonshotai/Moonlight-16B-A3B-Instruct
- LLM-Research/Phi-4-mini-instruct, LLM-Research/Phi-4-multimodal-instruct
- DeepSeek-V3-awq, deepseek-r1-awq
- Baichuan-M1-14B-Instruct
New Datasets
- Multi-modal GRPO:
- lmms-lab/multimodal-open-r1-8k-verified
- okwinds/clevr_cogen_a_train
What's Changed
- fix setup.py by @Jintao-Huang in #3198
- support vllm dp by @Jintao-Huang in #3201
- update dataset & fix bugs by @Jintao-Huang in #3203
- Support multiple vllms by @tastelikefeet in #3202
- update distill docs by @tastelikefeet in #3216
- compatible with trl0.16 by @hjh0119 in #3209
- support r1 awq by @Jintao-Huang in #3206
- fix grpo old_per_token_logps by @hjh0119 in #3220
- Support the generation of JanusPro models by @DaozeZhang in #3218
- Update the JanusPro-generation by @DaozeZhang in #3221
- fix load args by @Jintao-Huang in #3226
- update docs by @Jintao-Huang in #3230
- Speed up GRPO by @tastelikefeet in #3229
- fix docs zh by @Jintao-Huang in #3231
- fix deepseek_vl2 by @Jintao-Huang in #3233
- support moonlight by @Jintao-Huang in #3232
- support muon optimizer by @Jintao-Huang in #3234
- update docs by @Jintao-Huang in #3243
- fix grpo npu vllm by @hjh0119 in #3242
- fix grpo single card by @tastelikefeet in #3246
- save val_dataset by @Jintao-Huang in #3248
- fix grpo compat transformers==4.47.* by @Jintao-Huang in #3252
- grpo_countdown & fix format reward by @mi804 in #3269
- Support the base64 format of generated images for JanusPro by @DaozeZhang in #3265
- Fix typos by @co63oc in #3266
- compat lmdeploy 0.7 by @Jintao-Huang in #3256
- fix lmdeploy by @Jintao-Huang in #3274
- GRPO+LMDeploy 0.7 by @tastelikefeet in #3277
- Support max memory by @Jintao-Huang in #3282
- add lmdeploy dp shell by @Jintao-Huang in #3284
- Support Baichuan-M1-14B-Instruct by @DaozeZhang in #3271
- fix grpo top_k by @Jintao-Huang in #3293
- fix lmdeploy mllm in grpo by @tastelikefeet in #3296
- Update FAQ by @slin000111 in #3289
- fix: error when uploading model to huggingface by @xavier-h-10 in #3297
- add multimodal clevr exp by @mi804 in #3301
- update docs by @Jintao-Huang in #3304
- [refactor] patch_vllm by @Jintao-Huang in #3306
- GRPO mllm script by @hjh0119 in #3305
- [refactor & feat] support pt dynamic batch by @Jintao-Huang in #3278
- Support ZeRO++ by @tastelikefeet in #3315
- Revert pt engine batch infer by @Jintao-Huang in #3316
- optimize model_type by @Jintao-Huang in #3318
- Fix bugs & Update docs/datasets by @Jintao-Huang in #3322
- fix grpo zero3 by @hjh0119 in #3324
- fix grpo zero3 by @hjh0119 in #3326
- compat vllm>=0.5.1 lmdeploy>=0.5.0 by @Jintao-Huang in #3332
- update external plugins by @Jintao-Huang in #3334
- fix generation_config by @Jintao-Huang in #3335
- fix check_model error by @Jintao-Huang in #3336
- update get_model_tokenizer_with_flash_attn by @Jintao-Huang in #3337
- add geoqa grpo experiment by @mi804 in #3344
- fix max_memory by @Jintao-Huang in #3347
- support phi4-multimodal by @Jintao-Huang in #3350
- fix:fix bugs in cosine reward of GRPO by @youyc22 in #3358
- Remove entry including invalid
ROADMAP
link from English & Chinese documentation by @3manifold in #3357 - update docs by @Jintao-Huang in #3349
- Support the
- update docs by @Jintao-Huang in #3365
- add grpo openr1 multimodal experiment by @mi804 in #3368
- fix swift app format by @Jintao-Huang in #3367
New Contributors
- @xavier-h-10 made their first contribution in #3297
- @youyc22 made their first contribution in #3358
- @3manifold made their first contribution in #3357
Full Changelog: v3.1.1...v3.2.0
v3.1.1
中文版
新特性
- 支持大模型、多模态模型、Agent、多节点GRPO训练,参考这里。
- 支持Embeding模型训练,参考这里。
swift sample
支持MCTS、蒸馏方式数据采样,支持多模态模型采样。- 支持自定义数据集评测,参考这里。
新模型
- AIDC-AI/Ovis2-2B系列
- Qwen/Qwen2.5-VL-72B-Instruct-AWQ系列
- stepfun-ai/GOT-OCR-2.0-hf
- stepfun-ai/Step-Audio-Chat
- mistralai/Mistral-Small-24B-Instruct-2501
新数据集
- GRPO相关
- AI-ModelScope/MATH-lighteval
- LLM-Research/xlam-function-calling-60k
- AI-MO/NuminaMath-TIR
- R1相关
- liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT
- modelscope/MathR, modelscope/MathR-32B-Distill
New Features
- Support for large models, multimodal models, Agents, and multi-node GRPO training. Refer to this documentation.
- Support for Embedding model training. Refer to this script.
swift sample
supports MCTS and distillation data sampling, as well as multimodal model sampling.- Support for custom dataset evaluation. Refer to this documentation.
New Models
- AIDC-AI/Ovis2-2B series
- Qwen/Qwen2.5-VL-72B-Instruct-AWQ series
- stepfun-ai/GOT-OCR-2.0-hf
- stepfun-ai/Step-Audio-Chat
- mistralai/Mistral-Small-24B-Instruct-2501
New Datasets
- Related to GRPO
- AI-ModelScope/MATH-lighteval
- LLM-Research/xlam-function-calling-60k
- AI-MO/NuminaMath-TIR
- Related to R1
- liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT
- modelscope/MathR, modelscope/MathR-32B-Distill
What's Changed
- Add evalscope native backend by @Yunnglin in #2981
- support mistralai/Mistral-Small-24B-Instruct-2501 by @Jintao-Huang in #3030
- MCTS Sampler by @lxline in #2967
- fix windows url by @Jintao-Huang in #3041
- Support sample multi modal models by @tastelikefeet in #3048
- Support sft embedding model by @tastelikefeet in #3039
- support GRPO by @hjh0119 in #3022
- fix grpo by @hjh0119 in #3050
- fix grpo by @Jintao-Huang in #3051
- update docs (fine-tuning) by @Jintao-Huang in #3052
- bump version by @Jintao-Huang in #3053
- fix grpo model_type by @Jintao-Huang in #3057
- update rlhf documents by @hjh0119 in #3055
- add grpo multinode scripts by @hjh0119 in #3059
- Fix orm env by @tastelikefeet in #3065
- Support external plugins by @tastelikefeet in #3066
- update docs by @Jintao-Huang in #3070
- fix grpo nan by @Jintao-Huang in #3075
- fix grpo metric_for_best_model by @Jintao-Huang in #3077
- register MathR by @mi804 in #3078
- fix accuracy reward by @hjh0119 in #3080
- fix SwiftModel by @Jintao-Huang in #3071
- Fix grpo vlm (internvl2.5) by @Jintao-Huang in #3081
- Refactor orm prm by @Jintao-Huang in #3085
- fix competition math by @tastelikefeet in #3086
- support cuda operations to npu by @tastelikefeet in #3087
- fix grpo temperature 0.7->0.9 by @Jintao-Huang in #3091
- support grpo vllm lora by @Jintao-Huang in #3095
- Feat: Eval custom dataset by @Yunnglin in #3093
- cosine and repetition reward for GRPO by @hjh0119 in #3079
- fix get_device by @Jintao-Huang in #3097
- Fix/grpo by @MrToy in #3101
- fix unsloth by @tastelikefeet in #3100
- support grpo npu by @Jintao-Huang in #3102
- fix grpo zero3 by @Jintao-Huang in #3104
- support log completions by @Jintao-Huang in #3110
- Fix typos by @co63oc in #3111
- update trl version by @Jintao-Huang in #3117
- fix eval docs by @Jintao-Huang in #3118
- Support llamapro for grpo by @tastelikefeet in #3119
- fix grpo trainer by @Jintao-Huang in #3120
- fix cleanup error by @Jintao-Huang in #3121
- Fix typos by @co63oc in #3123
- refactor patcher by @Jintao-Huang in #3124
- Support lmdeploy in GRPO by @tastelikefeet in #3126
- support stepfun-ai/Step-Audio-Chat by @Jintao-Huang in #3127
- update docs by @Jintao-Huang in #3131
- fix grpo pt infer generation_config by @Jintao-Huang in #3135
- support_local_path by @Jintao-Huang in #3140
- Support swanlab by @tastelikefeet in #3142
- fix grpo sample by @MrToy in #3144
- fix grpo vllm lora by @Jintao-Huang in #3134
- fix create_repo by @tastelikefeet in #3147
- fix grpo zero3 by @Jintao-Huang in #3149
- docs: report_to add swanlab by @Zeyi-Lin in #3158
- Support Ovis2 models by @DaozeZhang in #3163
- support grpo metric_for_best_model by @Jintao-Huang in #3155
- Fix ovis2 by @Jintao-Huang in #3169
- Support Agent GRPO by @tastelikefeet in #3170
- fix max_length error by @Jintao-Huang in #3173
- fix streaming by @Jintao-Huang in #3176
- Fix/agent grpo by @tastelikefeet in #3172
- Fix lmdeploy branch by @tastelikefeet in #3145
- fix internvl-4b by @Jintao-Huang in #3178
- refactor cosine orm by @Jintao-Huang in #3179
- fix sampler reaches max_length by @tastelikefeet in #3180
- Fix prm in sampler by @tastelikefeet in #3184
- Support GOT_OCR2_hf by @DaozeZhang in #3182
- Knowledge Distillation sampling by @mi804 in #3185
- compat vllm==0.7.2 by @Jintao-Huang in #3083
- support r1 dataset by @Jintao-Huang in #3191
- Refactor grpo dataset by @Jintao-Huang in #3192
- Add links to agent grpo by @tastelikefeet in #3193
New Contributors
- @MrToy made their first contribution in #3101
- @co63oc made their first contribution in #3111
- @Zeyi-Lin made their first contribution in #3158
Full Changelog: v3.1.0...v3.1.1
v3.1.0
中文版
新特性
- 支持
swift sample
命令进行数据采样,参考这里。 - 支持强化微调训练,目前已支持拒绝采样微调,参考这里。
- Grounding任务自定义数据格式重构,参考这里。
swift infer
支持输出推理速度和ACC/ROUGE/BLEU指标。
新模型
- Qwen/Qwen2.5-VL-3B-Instruct系列
- Qwen/Qwen2.5-7B-Instruct-1M系列
- deepseek-ai/Janus-Pro-1B系列
- bytedance-research/UI-TARS-2B-SFT系列
新数据集
- ServiceNow-AI/R1-Distill-SFT
- bespokelabs/Bespoke-Stratos-17k
- open-thoughts/OpenThoughts-114k
English Version
New Features
- Supports the
swift sample
command for data sampling; refer to here. - Supports reinforcement fine-tuning training, with current support for rejection sampling fine-tuning; refer to here.
3Grounding task custom data format restructuring; refer to here. swift infer
supports outputting inference speed and ACC/ROUGE/BLEU metrics.
New Models
- Qwen/Qwen2.5-VL-3B-Instruct Series
- Qwen/Qwen2.5-7B-Instruct-1M Series
- deepseek-ai/Janus-Pro-1B Series
- bytedance-research/UI-TARS-2B-SFT Series
New Datasets
- ServiceNow-AI/R1-Distill-SFT
- bespokelabs/Bespoke-Stratos-17k
- open-thoughts/OpenThoughts-114k
What's Changed
- add "enable_prefix_caching" args for vllm engine. by @Leoyzen in #2939
- Fix vllm docs link & fix web-ui by @Jintao-Huang in #2970
- Fix sample by @tastelikefeet in #2971
- support merge-lora & quant by @Jintao-Huang in #2973
- support create_checkpoint_symlink by @Jintao-Huang in #2975
- Sampling and RFT by @tastelikefeet in #2977
- support auto dataset mapping by @Jintao-Huang in #2976
- support qwen2_5 long by @Jintao-Huang in #2982
- sys_prompt from file by @lxline in #2980
- support bytedance-research/UI-TARS-2B-SFT series by @Jintao-Huang in #2987
- support Qwen/Qwen2.5-VL-3B-Instruct series model by @Jintao-Huang in #2996
- fix qwen2_5-vl by @Jintao-Huang in #2998
- support Qwen/Qwen2.5-VL-72B-Instruct by @Jintao-Huang in #2999
- refactor grounding by @Jintao-Huang in #3000
- compatible with trl v0.13 by @hjh0119 in #2992
- update R1 dataset by @Jintao-Huang in #3005
- fix qwen2.5-vl grounding (refactor) by @Jintao-Huang in #2979
- fix deploy by @Jintao-Huang in #3007
- support infer metric: acc/rouge or bleu by @Jintao-Huang in #3008
- support deepseek janus pro by @Jintao-Huang in #3009
- update readme by @Jintao-Huang in #3011
- fix parse_dict by @Jintao-Huang in #3012
- update docs by @Jintao-Huang in #3015
- Fix readme & update docs by @Jintao-Huang in #3018
- fix push to hub by @tastelikefeet in #3024
- Fix bugs by @Jintao-Huang in #3025
- fix bugs by @Jintao-Huang in #3026
- Fix qwen tool template to official format by @Leoyzen in #2988
- fix message merging strategy when multi-turn tool calling. by @Leoyzen in #2986
New Contributors
Full Changelog: v3.0.3...v3.1.0
v3.0.3
中文版
新特性
- 支持多模态大模型SequenceClassification架构用于多模态分类任务,参考这里。
- 支持多模态大模型reward model训练。
新模型
- Shanghai_AI_Laboratory/internlm3-8b-instruct
- OpenBMB/MiniCPM-o-2_6
- deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B系列
- bytedance-research/Valley-Eagle-7B
- LLM-Research/phi-4
- Qwen/Qwen2.5-Math-PRM-7B, Qwen/Qwen2.5-Math-PRM-72B
- MiniMaxAI/MiniMax-Text-01, MiniMaxAI/MiniMax-VL-01
English Version
New Features
- Support multi-modal large model SequenceClassification architecture for multi-modal classification tasks, see here.
- Support training of multi-modal reward model.
New Models
- Shanghai_AI_Laboratory/internlm3-8b-instruct
- OpenBMB/MiniCPM-o-2_6
- deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B series
- bytedance-research/Valley-Eagle-7B
- LLM-Research/phi-4
- Qwen/Qwen2.5-Math-PRM-7B, Qwen/Qwen2.5-Math-PRM-72B
- MiniMaxAI/MiniMax-Text-01, MiniMaxAI/MiniMax-VL-01
What's Changed
- update qlora shell by @Jintao-Huang in #2880
- fix docs by @Jintao-Huang in #2882
- support multi round dpo by @tastelikefeet in #2884
- Support infer n parameter by @tastelikefeet in #2893
- Fix qwen vl eval by @Jintao-Huang in #2892
- fix infer engine by @Jintao-Huang in #2898
- Add phi4 by @tastelikefeet in #2895
- fix link & bug by @Jintao-Huang in #2902
- update video infer examples by @Jintao-Huang in #2840
- Sampler by @tastelikefeet in #2905
- Fix a bug when lint code by @tastelikefeet in #2906
- Fix bugs by @Jintao-Huang in #2907
- update plugin doc by @tastelikefeet in #2908
- fix vllm tp stuck by @Jintao-Huang in #2909
- fix replace_video2image by @Jintao-Huang in #2913
- Fix read file mode by @tastelikefeet in #2915
- fix inspect init by @Jintao-Huang in #2916
- Update rm by @tastelikefeet in #2919
- Add internlm3 dense by @HIT-cwh in #2920
- internlm3 lint pass by @Jintao-Huang in #2923
- Fix web ui log by @tastelikefeet in #2924
- Support Valley by @lxline in #2921
- support minicpm-o by @Jintao-Huang in #2918
- fix vllm tp block by @Jintao-Huang in #2927
- update docs by @Jintao-Huang in #2929
- Support first prms by @tastelikefeet in #2926
- fix Valley by @lxline in #2931
- Support mllm seq_cls/rm by @Jintao-Huang in #2934
- fix bugs by @Jintao-Huang in #2938
- support deepseek-ai/DeepSeek-R1 by @Jintao-Huang in #2940
- Fix quant template by @Jintao-Huang in #2942
- Support minimax by @tastelikefeet in #2943
- Fix mllm seq cls by @Jintao-Huang in #2945
- support deepseek_r1_distill by @Jintao-Huang in #2946
- fix demo_hf by @Jintao-Huang in #2951
- fix infer_stream by @Jintao-Huang in #2952
- fix citest by @Jintao-Huang in #2953
- fix bugs by @Jintao-Huang in #2954
- update requirements by @Jintao-Huang in #2957
- update web-ui images by @tastelikefeet in #2958
- update quant_mllm shell by @Jintao-Huang in #2959
- fix max_length error print by @Jintao-Huang in #2960
- fix seq_cls patcher by @Jintao-Huang in #2963
- ppo compat transformers>=4.47.* by @Jintao-Huang in #2964
Full Changelog: v3.0.2...v3.0.3