Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revert back to using PyAV instead of torch audio #961

Merged
merged 2 commits into from
Oct 23, 2024

Conversation

MahmoudAshraf97
Copy link
Collaborator

This PR reverts the torchaudio code that was added in #856 and removes torchaudio dependency but still keeps torch

the reason why torch wasn't removed in this PR is that feature extraction still depends on it and I didn't want to include the numpy feature extraction in this PR to keep it simple and to reduce the number of conflicts to be resolved with #936

this should be merged before #936, after both are merged, a new PR will be created to completely remove the torch dependency

@joiemoie
Copy link

Will removing torch remove the supposed FFT speedup?

@MahmoudAshraf97
Copy link
Collaborator Author

MahmoudAshraf97 commented Aug 16, 2024

Will removing torch remove the supposed FFT speedup?

Yes, but I have a new implementation using numpy in progress that is faster than the old implementation
These are the performance figures on my machine for a 30s segment:
Current Torch on cpu vs new numpy vs old numpy
image_20240817_022356948a5b27-82c6-46cd-a932-266a7b0610f8

I'm still investigating the possibility of GPU acceleration

@joiemoie
Copy link

joiemoie commented Aug 16, 2024

Great to hear! I work with segments about 10 seconds long, so no benefit from batching. However, I am curious and possibly interested into bumping up to the latest commit because of this FFT speedup and especially interested in GPU acceleration

@MahmoudAshraf97
Copy link
Collaborator Author

MahmoudAshraf97 commented Aug 17, 2024

these are the latest performance figures for the new FE @joiemoie

# torch on CPU
7.17 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# torch on GPU
797 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# New numpy on CPU
14.6 ms ± 516 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# New CuPY on GPU
1.73 ms ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# Old numpy on CPU
63.9 ms ± 2.73 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@ozancaglayan
Copy link
Contributor

ozancaglayan commented Aug 18, 2024

Do you take into account the overhead of moving long audio tensors to GPU when doing the above measurements on GPU?

Update: I tried it and its around 1ms at most. My numpy variant is around ~30ms per 30sec as well.

@ozancaglayan
Copy link
Contributor

Another option could actually be to move the feature extraction layer to CTranslate2 as well, whisper.cpp has some implementations
https://github.com/ggerganov/whisper.cpp/blob/master/src/whisper-mel-cuda.cu

@MahmoudAshraf97
Copy link
Collaborator Author

Another option could actually be to move the feature extraction layer to CTranslate2 as well, whisper.cpp has some implementations https://github.com/ggerganov/whisper.cpp/blob/master/src/whisper-mel-cuda.cu

OpenNMT/CTranslate2#1419 (comment)

@carolinaxxxxx
Copy link

@MahmoudAshraf97 good job sir. However I have a problem - it does not respect the --hotwords option at all in standard interference and batching mode. I tested on different materials. In the "old" version before the batching mode was introduced, --hotwords worked very well.

Copy link

@Asmar097 Asmar097 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job

@MahmoudAshraf97 MahmoudAshraf97 merged commit 42b8681 into SYSTRAN:master Oct 23, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants