feat: update A2T audio conversion #389

ad-astra-video · 2025-01-03T14:00:05Z

PR improves file handling of audio-to-text pipeline by removing call to ffmpeg to process the audio and uses pyav to convert to raw audio.

Cloud SPE test file fails in my testing sometimes with current conversion implementation using "mp3" container and format. The processing error was coming from the internal conversion to raw audio in transformers (links below). Switching to processing the audio file using pyav allows the Cloud SPE file to process correctly.

Transformers calling ffmpeg binary from preprocess function.
https://github.com/huggingface/transformers/blob/47c29ccfaf56947d845971a439cbe75a764b63d7/src/transformers/pipelines/automatic_speech_recognition.py#L353
https://github.com/huggingface/transformers/blob/47c29ccfaf56947d845971a439cbe75a764b63d7/src/transformers/pipelines/audio_utils.py#L10

Some marginal speed improvements:
Cloud SPE test file (3s clip)
965ms - bytes of container sent to model
895ms - np ndarray sent to model

Another test file (3m 22s)
5.6s - bytes of container sent to model
5.5s - np ndarray sent to model

eliteprox

LGTM! Tested with MP4 and regression MP3 and FLAC. Left a few nits for cleanup

runner/app/pipelines/audio_to_text.py

runner/app/pipelines/utils/audio.py

ad-astra-video · 2025-01-10T05:11:07Z

I made updates for small clean up and removed import not needed. Rebuilt docker container and tested to confirm both test files work (one test file is clouds test audio file).

update audio conversion

28050c2

ad-astra-video requested review from eliteprox and rickstaa January 3, 2025 14:00

ad-astra-video changed the title ~~update(ai): update A2T audio conversion~~ feat: update A2T audio conversion Jan 3, 2025

eliteprox approved these changes Jan 6, 2025

View reviewed changes

runner/app/pipelines/audio_to_text.py Outdated Show resolved Hide resolved

runner/app/pipelines/utils/audio.py Show resolved Hide resolved

ad-astra-video added 2 commits January 9, 2025 22:59

fix

a43ef77

remove import not needed

dc2729f

ad-astra-video merged commit 84924f7 into main Jan 10, 2025
11 of 12 checks passed

ad-astra-video deleted the a2t-audio-conversion-update branch January 10, 2025 05:12

ad-astra-video mentioned this pull request Jan 10, 2025

M4A conversion fails in audio-to-text pipeline #218

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: update A2T audio conversion #389

feat: update A2T audio conversion #389

ad-astra-video commented Jan 3, 2025

eliteprox left a comment

ad-astra-video commented Jan 10, 2025 •

edited

Loading

feat: update A2T audio conversion #389

feat: update A2T audio conversion #389

Conversation

ad-astra-video commented Jan 3, 2025

eliteprox left a comment

Choose a reason for hiding this comment

ad-astra-video commented Jan 10, 2025 • edited Loading

ad-astra-video commented Jan 10, 2025 •

edited

Loading