Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suppress_tokens=[] should be legal as some older. whisper models rely on this #36341

Open
2 of 4 tasks
Lewington-pitsos opened this issue Feb 22, 2025 · 0 comments · May be fixed by #36344
Open
2 of 4 tasks

suppress_tokens=[] should be legal as some older. whisper models rely on this #36341

Lewington-pitsos opened this issue Feb 22, 2025 · 0 comments · May be fixed by #36344
Labels

Comments

@Lewington-pitsos
Copy link

System Info

  • transformers version: 4.46.0
  • Platform: macOS-15.2-arm64-arm-64bit
  • Python version: 3.11.8
  • Huggingface_hub version: 0.25.2
  • Safetensors version: 0.4.2
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.1 (False)
  • Tensorflow version (GPU?): 2.16.1 (False)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed

Who can help?

I'm intending to fix this at once

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Run this code

import torch

from transformers import pipeline

# path to the audio file to be transcribed

audio = "/path/to/audio.format"

device = "cuda:0" if torch.cuda.is_available() else "cpu"

transcribe = pipeline(task="automatic-speech-recognition", model="vasista22/whisper-tamil-large-v2", chunk_length_s=30, device=device)

transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="ta", task="transcribe")

print('Transcription: ', transcribe(audio)["text"])

on any machine

Expected behavior

The model produces a prediction, and no error is thrown

What actually happens is I get

/Users/plato/code/translation-station/.venv/lib/python3.11/site-packages/transformers/models/whisper/generation_whisper.py:573: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead.
  warnings.warn(
Traceback (most recent call last):
  File "/Users/plato/code/translation-station/pad.py", line 11, in <module>
    print('Transcription: ', transcribe(audio)["text"])
                             ^^^^^^^^^^^^^^^^^
  File "/Users/plato/code/translation-station/.venv/lib/python3.11/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 283, in __call__
    return super().__call__(inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/plato/code/translation-station/.venv/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1360, in __call__
    return next(
           ^^^^^
  File "/Users/plato/code/translation-station/.venv/lib/python3.11/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/plato/code/translation-station/.venv/lib/python3.11/site-packages/transformers/pipelines/pt_utils.py", line 269, in __next__
    processed = self.infer(next(self.iterator), **self.params)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/plato/code/translation-station/.venv/lib/python3.11/site-packages/transformers/pipelines/base.py", line 1275, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/plato/code/translation-station/.venv/lib/python3.11/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 521, in _forward
    tokens = self.model.generate(
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/plato/code/translation-station/.venv/lib/python3.11/site-packages/transformers/models/whisper/generation_whisper.py", line 739, in generate
    decoder_input_ids, kwargs = self._prepare_decoder_input_ids(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/plato/code/translation-station/.venv/lib/python3.11/site-packages/transformers/models/whisper/generation_whisper.py", line 1782, in _prepare_decoder_input_ids
    prev_start_of_text = suppress_tokens[-2] if suppress_tokens is not None else None
                         ~~~~~~~~~~~~~~~^^^^
IndexError: index -2 is out of bounds for dimension 0 with size 0

this exact same error was noticed on some other models posted on huggingface by https://huggingface.co/vasista22 around 2 years ago, for example https://huggingface.co/vasista22/whisper-tamil-large-v2/discussions/4 and https://huggingface.co/vasista22/whisper-hindi-small/discussions/7

@Lewington-pitsos Lewington-pitsos linked a pull request Feb 22, 2025 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant