modelscope · Jintao-Huang · Oct 14, 2025 · Oct 14, 2025 · Oct 14, 2025 · Oct 14, 2025
diff --git a/docs/source/Instruction/命令行参数.md b/docs/source/Instruction/命令行参数.md
@@ -356,7 +356,7 @@ Vera使用`target_modules`、`target_regex`、`modules_to_save`三个参数，
   - 注意：该参数在"ms-swift<3.7"的参数名为`gpu_memory_utilization`。下面的`vllm_`参数同理。若出现参数不匹配问题，请查看[ms-swift3.6文档](https://swift.readthedocs.io/zh-cn/v3.6/Instruction/%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%8F%82%E6%95%B0.html#vllm)。
 - 🔥vllm_tensor_parallel_size: tp并行数，默认为`1`。
 - vllm_pipeline_parallel_size: pp并行数，默认为`1`。
-- vllm_data_parallel_size: dp并行数，默认为`1`，在`rollout`命令中生效。
+- vllm_data_parallel_size: dp并行数，默认为`1`，在`swift deploy/rollout`命令中生效。
   - 若在`swift infer`中，使用`NPROC_PER_NODE`来设置dp并行数。参考这里的[例子](https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/mllm_ddp.sh)。
 - vllm_enable_expert_parallel: 开启专家并行，默认为False。
 - vllm_max_num_seqs: 单次迭代中处理的最大序列数，默认为`256`。

diff --git a/docs/source_en/Instruction/Command-line-parameters.md b/docs/source_en/Instruction/Command-line-parameters.md
@@ -362,7 +362,7 @@ Parameter meanings can be found in the [vllm documentation](https://docs.vllm.ai
   - Note: For ms-swift versions earlier than 3.7, this parameter is named `gpu_memory_utilization`. The same applies to the following `vllm_` parameters. If you encounter parameter mismatch issues, please refer to the [ms-swift 3.6 documentation](https://swift.readthedocs.io/en/v3.6/Instruction/Command-line-parameters.html#vllm-arguments).
 - 🔥vllm_tensor_parallel_size: Tensor parallelism size. Default is `1`.
 - vllm_pipeline_parallel_size: Pipeline parallelism size. Default is `1`.
-- vllm_data_parallel_size: Data parallelism size, default is 1, effective in the infer and rollout commands.
+- vllm_data_parallel_size: Data parallelism size, default is `1`, effective in the `swift deploy/rollout` command.
- vllm_data_parallel_size: Data parallelism size, default is `1`, effective in the `swift deploy/rollout` command.
+- vllm_data_parallel_size: Number of data parallelism (DP) replicas. Default is `1`, effective in the `swift deploy/rollout` command.
- vllm_data_parallel_size: Data parallelism size, default is `1`, effective in the `swift deploy/rollout` command.
+- vllm_data_parallel_size: Number of data parallelism (DP) replicas. Default is `1`, effective in the `swift deploy/rollout` command.
   - In `swift infer`, use `NPROC_PER_NODE` to set the data parallelism (DP) degree. See the example [here](https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/mllm_ddp.sh).
 - vllm_enable_expert_parallel: Enable expert parallelism. Default is False.
 - vllm_max_num_seqs: Maximum number of sequences to be processed in a single iteration. Default is `256`.

diff --git a/examples/deploy/server/README.md → examples/deploy/README.md b/examples/deploy/server/README.md → examples/deploy/README.md
diff --git a/examples/deploy/server/sglang.sh → examples/deploy/sglang.sh b/examples/deploy/server/sglang.sh → examples/deploy/sglang.sh
diff --git a/examples/deploy/server/vllm.sh → examples/deploy/vllm.sh b/examples/deploy/server/vllm.sh → examples/deploy/vllm.sh
diff --git a/examples/deploy/vllm_dp.sh b/examples/deploy/vllm_dp.sh
@@ -0,0 +1,22 @@
+CUDA_VISIBLE_DEVICES=0,1 swift deploy \
+    --model Qwen/Qwen2.5-VL-7B-Instruct \
+    --infer_backend vllm \
+    --served_model_name Qwen2.5-VL-7B-Instruct \
+    --vllm_max_model_len 8192 \
+    --vllm_gpu_memory_utilization 0.9 \
+    --vllm_data_parallel_size 2
+
+# After the server-side deployment above is successful, use the command below to perform a client call test.
+
+# curl http://localhost:8000/v1/chat/completions \
+# -H "Content-Type: application/json" \
+# -d '{
+# "model": "Qwen2.5-VL-7B-Instruct",
+# "messages": [{"role": "user", "content": [
+#     {"type": "image", "image": "http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png"},
+#     {"type": "image", "image": "http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png"},
+#     {"type": "text", "text": "What is the difference between the two images?"}
+# ]}],
+# "max_tokens": 256,
+# "temperature": 0
+# }'
-# }'
+}'
+
-# }'
+}'
+