Skip to content

Commit db02ec7

Browse files
committed
Merge branch 'main' into fix/ppu-1107
2 parents f98b3f5 + 10d9096 commit db02ec7

File tree

18 files changed

+247
-17
lines changed

18 files changed

+247
-17
lines changed

docs/source/Megatron-SWIFT/Command-line-parameters.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,10 +185,12 @@
185185
- moe_ffn_hidden_size: 每个专家的前馈网络(ffn)的隐藏层大小。默认为None,自动从config.json读取。若未读取到且`num_experts`不为None,则设置为ffn_hidden_size。
186186
- moe_shared_expert_intermediate_size: 共享专家的总FFN隐藏层大小。如果有多个共享专家,它应等于 `num_shared_experts * ffn_size_of_each_shared_expert`。 默认为None。自动从config.json读取。
187187
- moe_router_topk: 每个token路由到的专家数量。默认为None。自动从config.json读取。
188+
- moe_router_num_groups: 将专家分成的组数,用于组限制路由。参考DeepSeek-V2和DeepSeek-V3。默认为None。自动从config.json读取。
189+
- moe_router_group_topk: 组限制路由中选择的组数。默认为None。自动从config.json读取。
188190
- moe_router_pre_softmax: 为MoE启用预softmax路由,这意味着softmax会在top-k选择之前进行。默认为None。自动从config.json读取。
189191
- 🔥moe_router_dtype: 用于路由计算和专家输出加权平均的数据类型。可选为'none', 'fp32'、'fp64',这增强了数值稳定性,尤其是在专家数量较多时。与`moe_permute_fusion`一起使用时,性能影响可以忽略不计。默认为'fp32'。'none'代表不改变数据类型。
190192
- moe_router_score_function: MoE TopK 路由的评分函数。可以为 "softmax" 或 "sigmoid"。默认为None,从config.json中读取。
191-
- moe_router_bias_update_rate: 在无辅助损失负载均衡策略中,专家偏置的更新速率。专家偏置根据每个专家在全局批次中被分配的 token 数量进行更新,对于分配到的 token 较少的专家,偏置会增加;对于分配到的 token 较多的专家,偏置会减少。默认值 1e-3,与 DeepSeekV3 中使用的值相同
193+
- moe_router_bias_update_rate: 在无辅助损失负载均衡策略中,专家偏置的更新速率。专家偏置根据每个专家在全局批次中被分配的 token 数量进行更新,对于分配到的 token 较少的专家,偏置会增加;对于分配到的 token 较多的专家,偏置会减少。默认为None,从config.json中读取
192194
- moe_router_enable_expert_bias: 在无辅助损失负载均衡策略中,带有动态专家偏置的 TopK 路由。路由决策基于路由分数与专家偏置之和。详情请参见:https://arxiv.org/abs/2408.15664。默认为None,自动从config.json读取。
193195
- moe_router_topk_scaling_factor: 默认为None。从config.json中读取。
194196
- moe_router_load_balancing_type: 确定路由器的负载均衡策略。可选项为"aux_loss"、"seq_aux_loss"、"sinkhorn"、"none"。默认值为 None。从config.json中读取。

docs/source_en/Megatron-SWIFT/Command-line-parameters.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -197,10 +197,12 @@ For guidance on selecting parallelization strategies, please refer to the [Train
197197
- moe_ffn_hidden_size: Hidden layer size of the feedforward network (ffn) for each expert. Default is None and will be automatically read from config.json. If not found and `num_experts` is not None, it will be set to ffn_hidden_size.
198198
- moe_shared_expert_intermediate_size: The total FFN hidden layer size for shared experts. If there are multiple shared experts, it should equal `num_shared_experts * ffn_size_of_each_shared_expert`. Default is None. Automatically read from config.json.
199199
- moe_router_topk: The number of experts each token is routed to. Default is None. Automatically read from config.json.
200+
- moe_router_num_groups: Number of groups to divide experts into for group-limited routing. Refers to DeepSeek-V2 and DeepSeek-V3. Default is None. Automatically read from config.json.
201+
- moe_router_group_topk: Number of selected groups for group-limited routing. Default is None. Automatically read from config.json.
200202
- moe_router_pre_softmax: Enable pre-softmax routing for MoE, meaning that softmax will be applied before top-k selection. Default is None. Automatically read from config.json.
201203
- 🔥moe_router_dtype: Data type used for routing computation and expert output weighted averaging. Options are 'none', 'fp32', and 'fp64', which enhances numerical stability, especially when the number of experts is large. When used together with `moe_permute_fusion`, the performance impact is negligible. Default is 'fp32'. 'none' means no change to data type.
202204
- moe_router_score_function: Scoring function for MoE TopK routing. Can be "softmax" or "sigmoid". Default is None and is read from config.json.
203-
- moe_router_bias_update_rate: Update rate of expert bias in the auxiliary-loss-free load balancing strategy. Expert bias is updated based on the number of tokens each expert is assigned in the global batch: bias increases for experts assigned fewer tokens, and decreases for those assigned more tokens. Default is 1e-3, same as used in DeepSeekV3.
205+
- moe_router_bias_update_rate: Update rate of expert bias in the auxiliary-loss-free load balancing strategy. Expert bias is updated based on the number of tokens each expert is assigned in the global batch: bias increases for experts assigned fewer tokens, and decreases for those assigned more tokens. Default is None and is read from config.json.
204206
- moe_router_enable_expert_bias: TopK routing with dynamic expert bias in the auxiliary-loss-free load balancing strategy. Routing decisions are based on the sum of routing scores and expert bias. See details at: https://arxiv.org/abs/2408.15664. Default is None and is automatically read from config.json.
205207
- moe_router_topk_scaling_factor: Default is None. This parameter is read from config.json.
206208
- moe_router_load_balancing_type: Determines the router’s load balancing strategy. Options are "aux_loss", "seq_aux_loss", "sinkhorn", and "none". Default is None and is read from config.json.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
{
2+
"compute_environment": "LOCAL_MACHINE",
3+
"debug": false,
4+
"distributed_type": "FSDP",
5+
"downcast_bf16": "no",
6+
"fsdp_config": {
7+
"fsdp_auto_wrap_policy": "TRANSFORMER_BASED_WRAP",
8+
"fsdp_cpu_ram_efficient_loading": true,
9+
"fsdp_reshard_after_forward": true,
10+
"fsdp_state_dict_type": "FULL_STATE_DICT",
11+
"fsdp_activation_checkpointing": true,
12+
"fsdp_version": 2
13+
},
14+
"machine_rank": 0,
15+
"main_training_function": "main",
16+
"mixed_precision": "bf16",
17+
"num_machines": 1,
18+
"num_processes": 2,
19+
"rdzv_backend": "static",
20+
"same_network": true,
21+
"tpu_env": [],
22+
"tpu_use_cluster": false,
23+
"tpu_use_sudo": false,
24+
"use_cpu": false
25+
}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# 14.7GiB * 2
2+
nproc_per_node=2
3+
4+
CUDA_VISIBLE_DEVICES=0,1 \
5+
accelerate launch --config_file "./examples/train/multi-gpu/fsdp2_lora/fsdp2.json" \
6+
swift/cli/sft.py \
7+
--model Qwen/Qwen2.5-7B-Instruct \
8+
--train_type lora \
9+
--dataset 'swift/self-cognition#1000' \
10+
--torch_dtype bfloat16 \
11+
--num_train_epochs 1 \
12+
--per_device_train_batch_size 1 \
13+
--per_device_eval_batch_size 1 \
14+
--learning_rate 1e-4 \
15+
--lora_rank 8 \
16+
--lora_alpha 32 \
17+
--gradient_checkpointing false \
18+
--weight_decay 0.1 \
19+
--target_modules all-linear \
20+
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
21+
--eval_steps 100 \
22+
--save_steps 100 \
23+
--save_total_limit 2 \
24+
--logging_steps 5 \
25+
--max_length 2048 \
26+
--output_dir output \
27+
--system 'You are a helpful assistant.' \
28+
--warmup_ratio 0.05 \
29+
--dataloader_num_workers 4 \
30+
--model_author swift \
31+
--model_name swift-robot
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
import os
2+
from typing import List
3+
4+
from swift.llm import BaseArguments, InferRequest, PtEngine, get_template
5+
6+
os.environ['IMAGE_MAX_TOKEN_NUM'] = '1024'
7+
os.environ['VIDEO_MAX_TOKEN_NUM'] = '128'
8+
os.environ['FPS_MAX_FRAMES'] = '16'
9+
10+
infer_request = InferRequest(
11+
messages=[{
12+
'role':
13+
'user',
14+
'content':
15+
"多标签分类,类别包括:['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', "
16+
"'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', "
17+
"'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']"
18+
}],
19+
images=['xxx.jpg'])
20+
adapter_path = 'output/vx-xxx/checkpoint-xxx'
21+
args = BaseArguments.from_pretrained(adapter_path)
22+
23+
engine = PtEngine(
24+
args.model,
25+
adapters=[adapter_path],
26+
task_type='seq_cls',
27+
num_labels=args.num_labels,
28+
problem_type=args.problem_type)
29+
template = get_template(args.template, engine.processor, args.system, use_chat_template=args.use_chat_template)
30+
engine.default_template = template
31+
32+
resp_list = engine.infer([infer_request])
33+
response: List[int] = resp_list[0].choices[0].message.content
34+
print(f'response: {response}')
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
CUDA_VISIBLE_DEVICES=0 \
2+
IMAGE_MAX_TOKEN_NUM=1024 \
3+
VIDEO_MAX_TOKEN_NUM=128 \
4+
FPS_MAX_FRAMES=16 \
5+
swift infer \
6+
--adapters output/vx-xxx/checkpoint-xxx \
7+
--load_data_args true
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
CUDA_VISIBLE_DEVICES=0 \
2+
IMAGE_MAX_TOKEN_NUM=1024 \
3+
VIDEO_MAX_TOKEN_NUM=128 \
4+
FPS_MAX_FRAMES=16 \
5+
swift sft \
6+
--model Qwen/Qwen3-VL-4B-Instruct \
7+
--train_type lora \
8+
--dataset 'clip-benchmark/wds_voc2007_multilabel' \
9+
--load_from_cache_file true \
10+
--split_dataset_ratio 0.01 \
11+
--torch_dtype bfloat16 \
12+
--num_train_epochs 2 \
13+
--per_device_train_batch_size 16 \
14+
--per_device_eval_batch_size 16 \
15+
--learning_rate 1e-4 \
16+
--lora_rank 8 \
17+
--lora_alpha 32 \
18+
--target_modules all-linear \
19+
--gradient_accumulation_steps 1 \
20+
--eval_steps 100 \
21+
--save_steps 100 \
22+
--save_total_limit 2 \
23+
--logging_steps 5 \
24+
--max_length 2048 \
25+
--output_dir output \
26+
--warmup_ratio 0.05 \
27+
--dataloader_num_workers 4 \
28+
--dataset_num_proc 4 \
29+
--num_labels 20 \
30+
--task_type seq_cls \
31+
--problem_type multi_label_classification

swift/llm/dataset/dataset/mllm.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1306,3 +1306,22 @@ def preprocess(self, row: Dict[str, Any]) -> Dict[str, Any]:
13061306
hf_dataset_id='leonardPKU/clevr_cogen_a_train',
13071307
preprocess_func=ClevrPreprocessor(),
13081308
tags=['qa', 'math', 'vision', 'grpo']))
1309+
1310+
1311+
class Voc2007MultilabelPreprocessor(ResponsePreprocessor):
1312+
CLASS_NAME = ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable',
1313+
'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor')
1314+
1315+
def preprocess(self, row: Dict[str, Any]) -> Dict[str, Any]:
1316+
row['query'] = f'多标签分类,类别包括:{list(self.CLASS_NAME)}'
1317+
row['label'] = [i for i, x in enumerate(row['npy']) if x == 1]
1318+
return super().preprocess(row)
1319+
1320+
1321+
register_dataset(
1322+
DatasetMeta(
1323+
ms_dataset_id='clip-benchmark/wds_voc2007_multilabel',
1324+
hf_dataset_id='clip-benchmark/wds_voc2007_multilabel',
1325+
preprocess_func=Voc2007MultilabelPreprocessor(columns={'webp': 'images'}),
1326+
tags=['multilabel', 'multi-modal'],
1327+
))

swift/llm/infer/deploy.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
import uvicorn
1414
from aiohttp import ClientConnectorError
1515
from fastapi import FastAPI, Request
16-
from fastapi.responses import JSONResponse, StreamingResponse
16+
from fastapi.responses import JSONResponse, Response, StreamingResponse
1717

1818
from swift.llm import AdapterRequest, DeployArguments, InferArguments
1919
from swift.llm.infer.protocol import EmbeddingRequest, MultiModalRequestMixin
@@ -42,6 +42,9 @@ def get_infer_engine(args: InferArguments, template=None, **kwargs):
4242
return SwiftInfer.get_infer_engine(args, template, **kwargs)
4343

4444
def _register_app(self):
45+
self.app.get('/health')(self.health)
46+
self.app.get('/ping')(self.ping)
47+
self.app.post('/ping')(self.ping)
4548
self.app.get('/v1/models')(self.get_available_models)
4649
self.app.post('/v1/chat/completions')(self.create_chat_completion)
4750
self.app.post('/v1/completions')(self.create_completion)
@@ -85,6 +88,17 @@ def _get_model_list(self):
8588
model_list += [name for name in args.adapter_mapping.keys()]
8689
return model_list
8790

91+
async def health(self) -> Response:
92+
"""Health check endpoint."""
93+
if self.infer_engine is not None:
94+
return Response(status_code=200)
95+
else:
96+
return Response(status_code=503)
97+
98+
async def ping(self) -> Response:
99+
"""Ping check endpoint. Required for SageMaker compatibility."""
100+
return await self.health()
101+
88102
async def get_available_models(self):
89103
model_list = self._get_model_list()
90104
data = [Model(id=model_id, owned_by=self.args.owned_by) for model_id in model_list]

swift/llm/train/tuner.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ def get_multimodal_target_regex(
9393
freeze_vit: bool = True,
9494
freeze_aligner: bool = True,
9595
include_embedding: bool = False,
96+
exclude_router: bool = False,
9697
) -> str:
9798
model_arch = model.model_meta.model_arch
9899
modules = []
@@ -117,6 +118,8 @@ def get_multimodal_target_regex(
117118

118119
sub_module = deep_getattr(model, module)
119120
target_modules = find_all_linears(sub_module, model_arch, extra_layers)
121+
if exclude_router and model.model_info.is_moe_model:
122+
target_modules = [tm for tm in target_modules if tm not in {'gate'}]
120123
if not target_modules:
121124
continue
122125
target_modules = [tm for tm in target_modules if tm]

0 commit comments

Comments
 (0)