Skip to content

Commit e53831c

Browse files
tastelikefeettastelikefeet
andauthored
Fix qwen3 vl sp (modelscope#6514)
Co-authored-by: tastelikefeet <[email protected]>
1 parent 2a07f0a commit e53831c

File tree

3 files changed

+3
-3
lines changed

3 files changed

+3
-3
lines changed

docs/source/Instruction/Command-line-parameters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -843,4 +843,4 @@ qwen2_5_omni除了包含qwen2_5_vl和qwen2_audio的模型特定参数外,还
843843
- VLLM_USE_V1: 用于切换vLLM使用V0/V1版本。
844844
- SWIFT_TIMEOUT: (ms-swift>=3.10) 若多模态数据集中存在图像URL,该参数用于控制获取图片的timeout,默认为20s。
845845
- ROOT_IMAGE_DIR: (ms-swift>=3.8) 图像(多模态)资源的根目录。通过设置该参数,可以在数据集中使用相对于 `ROOT_IMAGE_DIR` 的相对路径。默认情况下,是相对于运行目录的相对路径。
846-
- SWIFT_SINGLE_DEVICE_MODE: (ms-swift>=3.10) 单设备模式,在此模式下,所有进程只能看到一个设备,目前用于兼容PPU设备
846+
- SWIFT_SINGLE_DEVICE_MODE: (ms-swift>=3.10) 单设备模式,在此模式下,每个进程只能看到一个设备,目前用于兼容PPU设备

docs/source_en/Instruction/Command-line-parameters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -868,4 +868,4 @@ The meanings of the following parameters can be found in the example code [here]
868868
- VLLM_USE_V1: Used to switch between V0 and V1 versions of vLLM.
869869
- SWIFT_TIMEOUT: (ms-swift >= 3.10) If the multimodal dataset contains image URLs, this parameter controls the timeout for fetching images, defaulting to 20 seconds.
870870
- ROOT_IMAGE_DIR: (ms-swift>=3.8) The root directory for image (multimodal) resources. By setting this parameter, relative paths in the dataset can be interpreted relative to `ROOT_IMAGE_DIR`. By default, paths are relative to the current working directory.
871-
- SWIFT_SINGLE_DEVICE_MODE: (ms-swift>=3.10) Single device mode. In this mode, all processes can only see one device. Currently used for compatibility with PPU devices.
871+
- SWIFT_SINGLE_DEVICE_MODE: (ms-swift>=3.10) Single device mode. In this mode, each process can only see one device. Currently used for compatibility with PPU devices.

swift/llm/model/model/qwen.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -936,7 +936,7 @@ def _patch_deepstack_process(model):
936936
def _deepstack_process(self, hidden_states: torch.Tensor, visual_pos_masks: torch.Tensor,
937937
visual_embeds: torch.Tensor):
938938
from swift.trainers.sequence_parallel import sequence_parallel
939-
if sequence_parallel.world_size:
939+
if sequence_parallel.world_size and visual_pos_masks is not None:
940940
visual_pos_masks, visual_embeds = sequence_parallel.pad_and_split_mm_tokens(visual_pos_masks, visual_embeds)
941941
if visual_pos_masks is None:
942942
return hidden_states + visual_embeds.mean() * 0

0 commit comments

Comments
 (0)