You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
audio-to-text pipeline is not returning word level timestamps.
@RUFFY-369 is there a way to change to sdpa if word level timestamps is requested without reloading the pipeline to the gpu?
Reproduction steps
Download new audio-to-text pipeline with flash attention 2 enabled
Send request to pipeline including return_timestamps=word curl -X POST http://172.17.0.1:6666/audio-to-text -F "[email protected]" -F "model_id=openai/whisper-large-v3" -F "return_timestamps=word"
See error returned {"error":{"message":": Error during model execution: WhisperFlashAttention2 attention does not support output_attentions."}}
Expected behaviour
Return word level timestamps.
Severity
None
Screenshots / Live demo link
No response
OS
None
Running on
None
AI-worker version
No response
Additional context
No response
@ad-astra-video We can't change attention implemention in __call__ without reinitializing the pipeline because pipeline initialization initializes the model and here in the WhisperEncoderLayer and WhisperDecoderLayer gets their self_attn initialized. So, basically even if you want to change the attn_implementation in __call__ , you need to change the attention layers in the above mentioned layers directly, which just means reinitializing the model itself. Also, for the testing's sake I tried to dynamically change attention implemention in __call__ with self.tm.model.config._attn_implementation and it didn't work.
If we really want to change attn_implementation to sdpa then we have to change the attention layers in encoder's and decoder's attention modules with model.named_modules() which basically almost amount to just reinitializing the model.
Or an alternate solution can be to have an extra model instance for word level timestamps without reinitializing the pipeline
Describe the bug
audio-to-text
pipeline is not returning word level timestamps.@RUFFY-369 is there a way to change to
sdpa
if word level timestamps is requested without reloading the pipeline to the gpu?Reproduction steps
audio-to-text
pipeline with flash attention 2 enabledreturn_timestamps=word
curl -X POST http://172.17.0.1:6666/audio-to-text -F "[email protected]" -F "model_id=openai/whisper-large-v3" -F "return_timestamps=word"
{"error":{"message":": Error during model execution: WhisperFlashAttention2 attention does not support output_attentions."}}
Expected behaviour
Return word level timestamps.
Severity
None
Screenshots / Live demo link
No response
OS
None
Running on
None
AI-worker version
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: