❓ How can I deal with long audio in Speech-to-text? (e.g., 60 - 120 min) #102

WindChimeRan · 2021-10-07T21:35:57Z

WindChimeRan
Oct 7, 2021

Hi,

I am trying to run the speech-to-text model on GPU/CPU for large audio file. but I got out-of-memory error from both sides.

Is there any iterable lazy dataloader that can feed the audio file 10 min by 10 min?

I have tried some silence-based audio segmentation, but the performance is not as the same level as silero.

Answered by snakers4

Oct 8, 2021

Hi,

We do not have public built-in streaming interface for our models for simplicity reasons.

You can try our VAD to split audio into chunks - https://github.com/snakers4/silero-vad

STT best works for 5-15s audio chunks anyway.

In case some chunk is longer, you may use an align method in the decoder, apply it to a fixed length chunk, split on some word and just run STT the second time on the subchunks.

View full answer

snakers4 · 2021-10-08T03:28:57Z

snakers4
Oct 8, 2021
Maintainer

Hi,

We do not have public built-in streaming interface for our models for simplicity reasons.

You can try our VAD to split audio into chunks - https://github.com/snakers4/silero-vad

STT best works for 5-15s audio chunks anyway.

In case some chunk is longer, you may use an align method in the decoder, apply it to a fixed length chunk, split on some word and just run STT the second time on the subchunks.

1 reply

ZhengHe-MD May 31, 2022

Feeding vad results into stt models works well, memory footprint reduce from 12G to 2G or so on a 17-min long audio.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

❓ How can I deal with long audio in Speech-to-text? (e.g., 60 - 120 min) #102

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

❓ How can I deal with long audio in Speech-to-text? (e.g., 60 - 120 min) #102

Uh oh!

WindChimeRan Oct 7, 2021

Replies: 1 comment · 1 reply

Uh oh!

snakers4 Oct 8, 2021 Maintainer

Uh oh!

Uh oh!

ZhengHe-MD May 31, 2022

WindChimeRan
Oct 7, 2021

Replies: 1 comment 1 reply

snakers4
Oct 8, 2021
Maintainer