Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR improves file handling of
audio-to-text
pipeline by removing call to ffmpeg to process the audio and uses pyav to convert to raw audio.Cloud SPE test file fails in my testing sometimes with current conversion implementation using "mp3" container and format. The processing error was coming from the internal conversion to raw audio in transformers (links below). Switching to processing the audio file using
pyav
allows the Cloud SPE file to process correctly.Transformers calling ffmpeg binary from
preprocess
function.https://github.com/huggingface/transformers/blob/47c29ccfaf56947d845971a439cbe75a764b63d7/src/transformers/pipelines/automatic_speech_recognition.py#L353
https://github.com/huggingface/transformers/blob/47c29ccfaf56947d845971a439cbe75a764b63d7/src/transformers/pipelines/audio_utils.py#L10
Some marginal speed improvements:
Cloud SPE test file (3s clip)
965ms - bytes of container sent to model
895ms - np ndarray sent to model
Another test file (3m 22s)
5.6s - bytes of container sent to model
5.5s - np ndarray sent to model