Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

audio-to-text pipeline fails on return_timestamps=word #390

Open
ad-astra-video opened this issue Jan 3, 2025 · 1 comment
Open

audio-to-text pipeline fails on return_timestamps=word #390

ad-astra-video opened this issue Jan 3, 2025 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@ad-astra-video
Copy link
Collaborator

ad-astra-video commented Jan 3, 2025

Describe the bug

audio-to-text pipeline is not returning word level timestamps.

@RUFFY-369 is there a way to change to sdpa if word level timestamps is requested without reloading the pipeline to the gpu?

image

Reproduction steps

  1. Download new audio-to-text pipeline with flash attention 2 enabled
  2. Send request to pipeline including return_timestamps=word
    curl -X POST http://172.17.0.1:6666/audio-to-text -F "[email protected]" -F "model_id=openai/whisper-large-v3" -F "return_timestamps=word"
  3. See error returned
    {"error":{"message":": Error during model execution: WhisperFlashAttention2 attention does not support output_attentions."}}

Expected behaviour

Return word level timestamps.

Severity

None

Screenshots / Live demo link

No response

OS

None

Running on

None

AI-worker version

No response

Additional context

No response

@ad-astra-video ad-astra-video added the bug Something isn't working label Jan 3, 2025
@RUFFY-369
Copy link
Collaborator

RUFFY-369 commented Jan 6, 2025

Describe the bug

audio-to-text pipeline is not returning word level timestamps.

@RUFFY-369 is there a way to change to sdpa if word level timestamps is requested without reloading the pipeline to the gpu?

image

Reproduction steps

  1. Download new audio-to-text pipeline with flash attention 2 enabled
  2. Send request to pipeline including return_timestamps=word
    curl -X POST http://172.17.0.1:6666/audio-to-text -F "[email protected]" -F "model_id=openai/whisper-large-v3" -F "return_timestamps=word"
  3. See error returned
    {"error":{"message":": Error during model execution: WhisperFlashAttention2 attention does not support output_attentions."}}

Expected behaviour

Return word level timestamps.

Severity

None

Screenshots / Live demo link

No response

OS

None

Running on

None

AI-worker version

No response

Additional context

No response

@ad-astra-video We can't change attention implemention in __call__ without reinitializing the pipeline because pipeline initialization initializes the model and here in the WhisperEncoderLayer and WhisperDecoderLayer gets their self_attn initialized. So, basically even if you want to change the attn_implementation in __call__ , you need to change the attention layers in the above mentioned layers directly, which just means reinitializing the model itself. Also, for the testing's sake I tried to dynamically change attention implemention in __call__ with self.tm.model.config._attn_implementation and it didn't work.

If we really want to change attn_implementation to sdpa then we have to change the attention layers in encoder's and decoder's attention modules with model.named_modules() which basically almost amount to just reinitializing the model.

Or an alternate solution can be to have an extra model instance for word level timestamps without reinitializing the pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants