Skip to content

[WhisperPipeline] Batch Size Mismatch Error When Using Beam Search (with iGPU) #2746

@e950280

Description

@e950280

Problem Description

When running speech recognition with OpenVINO WhisperPipeline, setting num_beams=2 or higher causes a batch size mismatch error.

Error Message

Traceback (most recent call last):
File "", line 1, in
RuntimeError: Exception from src/core/src/runtime/tensor.cpp:121:
Exception from src/inference/dev_api/openvino/runtime/iremote_tensor.hpp:32:
Not Implemented

Code

from librosa import load
from openvino_genai import WhisperPipeline

audio, _ = load("./recording.wav", sr=16000)

pipe = WhisperPipeline("./whisper_model/whisper-medium/", device="GPU")

results = pipe.generate(
audio,
language="<|en|>",
num_beams=2, # Setting this value to 2 or higher causes the error
)

Expected Behavior

Speech recognition should work normally on the iGPU when Beam Search is enabled (num_beams = 2 or higher).

Confirmation of other information.

  1. Using pipe = WhisperPipeline("./whisper_model/whisper-medium/", device="CPU") works.
    #[https://github.com/[WhisperPipeline] Batch Size Mismatch Error When Using Beam Search #2069]
  2. Using GPU without beam works. (num_beams = 1)

CPU Information

Intel Core ultra 5 125H

OS Information

PRETTY_NAME="Ubuntu 24.04.1 LTS"

Python Version

python 3.12.3

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions