revert back to using PyAV instead of torch audio #961

MahmoudAshraf97 · 2024-08-14T15:06:46Z

This PR reverts the torchaudio code that was added in #856 and removes torchaudio dependency but still keeps torch

the reason why torch wasn't removed in this PR is that feature extraction still depends on it and I didn't want to include the numpy feature extraction in this PR to keep it simple and to reduce the number of conflicts to be resolved with #936

this should be merged before #936, after both are merged, a new PR will be created to completely remove the torch dependency

joiemoie · 2024-08-16T23:17:50Z

Will removing torch remove the supposed FFT speedup?

MahmoudAshraf97 · 2024-08-16T23:27:42Z

Will removing torch remove the supposed FFT speedup?

Yes, but I have a new implementation using numpy in progress that is faster than the old implementation
These are the performance figures on my machine for a 30s segment:
Current Torch on cpu vs new numpy vs old numpy

I'm still investigating the possibility of GPU acceleration

joiemoie · 2024-08-16T23:51:54Z

Great to hear! I work with segments about 10 seconds long, so no benefit from batching. However, I am curious and possibly interested into bumping up to the latest commit because of this FFT speedup and especially interested in GPU acceleration

MahmoudAshraf97 · 2024-08-17T21:43:44Z

these are the latest performance figures for the new FE @joiemoie

# torch on CPU
7.17 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# torch on GPU
797 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# New numpy on CPU
14.6 ms ± 516 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# New CuPY on GPU
1.73 ms ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# Old numpy on CPU
63.9 ms ± 2.73 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

ozancaglayan · 2024-08-18T13:04:27Z

Do you take into account the overhead of moving long audio tensors to GPU when doing the above measurements on GPU?

Update: I tried it and its around 1ms at most. My numpy variant is around ~30ms per 30sec as well.

ozancaglayan · 2024-08-18T13:22:04Z

Another option could actually be to move the feature extraction layer to CTranslate2 as well, whisper.cpp has some implementations
https://github.com/ggerganov/whisper.cpp/blob/master/src/whisper-mel-cuda.cu

MahmoudAshraf97 · 2024-08-20T13:00:28Z

Another option could actually be to move the feature extraction layer to CTranslate2 as well, whisper.cpp has some implementations https://github.com/ggerganov/whisper.cpp/blob/master/src/whisper-mel-cuda.cu

OpenNMT/CTranslate2#1419 (comment)

carolinaxxxxx · 2024-09-12T17:53:32Z

@MahmoudAshraf97 good job sir. However I have a problem - it does not respect the --hotwords option at all in standard interference and batching mode. I tested on different materials. In the "old" version before the batching mode was introduced, --hotwords worked very well.

Asmar097

Good job

revert back to using PyAV instead of torch audio

ab192e7

MahmoudAshraf97 mentioned this pull request Aug 14, 2024

Use Silero VAD in Batched Mode #936

Merged

This was referenced Oct 16, 2024

Update upstream fork changes #1068

Closed

Tet upstream fork changes VoxAI-tech/faster-whisper#1

Open

Update audio.py

df94af8

MahmoudAshraf97 requested review from nguyendc-systran and minhthuc2502 October 23, 2024 12:01

Asmar097 approved these changes Oct 23, 2024

View reviewed changes

MahmoudAshraf97 merged commit 42b8681 into SYSTRAN:master Oct 23, 2024
3 checks passed

MahmoudAshraf97 deleted the pyav branch October 24, 2024 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revert back to using PyAV instead of torch audio #961

revert back to using PyAV instead of torch audio #961

MahmoudAshraf97 commented Aug 14, 2024

joiemoie commented Aug 16, 2024

MahmoudAshraf97 commented Aug 16, 2024 •

edited

Loading

joiemoie commented Aug 16, 2024 •

edited

Loading

MahmoudAshraf97 commented Aug 17, 2024 •

edited

Loading

ozancaglayan commented Aug 18, 2024 •

edited

Loading

ozancaglayan commented Aug 18, 2024

MahmoudAshraf97 commented Aug 20, 2024

carolinaxxxxx commented Sep 12, 2024

Asmar097 left a comment

revert back to using PyAV instead of torch audio #961

revert back to using PyAV instead of torch audio #961

Conversation

MahmoudAshraf97 commented Aug 14, 2024

joiemoie commented Aug 16, 2024

MahmoudAshraf97 commented Aug 16, 2024 • edited Loading

joiemoie commented Aug 16, 2024 • edited Loading

MahmoudAshraf97 commented Aug 17, 2024 • edited Loading

ozancaglayan commented Aug 18, 2024 • edited Loading

ozancaglayan commented Aug 18, 2024

MahmoudAshraf97 commented Aug 20, 2024

carolinaxxxxx commented Sep 12, 2024

Asmar097 left a comment

Choose a reason for hiding this comment

MahmoudAshraf97 commented Aug 16, 2024 •

edited

Loading

joiemoie commented Aug 16, 2024 •

edited

Loading

MahmoudAshraf97 commented Aug 17, 2024 •

edited

Loading

ozancaglayan commented Aug 18, 2024 •

edited

Loading