Skip to content

Commit 30ae6b4

Browse files
mswiniarskmichalkuligowski
authored andcommitted
Disable delayed sampling for fake_hpu mode
1 parent 84ed307 commit 30ae6b4

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/worker/hpu_model_runner.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2462,7 +2462,7 @@ def execute_model(
24622462
) -> Optional[Union[List[SamplerOutput], IntermediateTensors]]:
24632463
# Delayed sampling is only supported for single step scheduling
24642464
use_delayed_sampling = VLLM_DELAYED_SAMPLING and not warmup_mode \
2465-
and self.is_single_step
2465+
and self.is_single_step and not is_fake_hpu()
24662466
assert model_input.input_tokens is not None
24672467
if use_delayed_sampling and not model_input.is_prompt and \
24682468
self.is_driver_worker:

0 commit comments

Comments
 (0)