-
-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diarization pipeline v3.1 is much slower than 3.0 when running on CPU #1621
Comments
I'm also having this issue. Using the above code I get:
System InformationThis is on a Macbook Pro m1, running on the CPU. Torch 2.1.2 |
Would you mind sharing a Google Colab that I can just click and run? |
For a 2 minute audio it took me 115.84s on v3.0.1 and 559.31s on v.3.1.0. |
Thanks for taking the time to prepare a notebook. That helps.
from pyannote.audio.pipelines.utils.hook import Hook, ProgressHook, TimingHook
file = {"audio": ...}"
# progress hook alone (will show progress bar)
with ProgressHook() as hook:
diarization = pipeline(file, hook=hook)
# timing hook alone (will add a "timing" key in file)
with TimingHook() as hook:
diarization = pipeline(file, hook=hook)
# both
with Hook(ProgressHook(), TimingHook()) as hook:
diarization = pipeline(file, hook=hook) |
Thanks for the hint, I updated the notebook with the sample audio from tutorials and hooks. According to them, embedding step takes much longer in the new version. |
Thanks to the completed MRE, I can now reproduce the issue. The main difference between 3.0 and 3.1 is the switch from ONNX to pytorch inference. On GPU: pytorch is faster than ONNX. Could anyone using pyannote in production on CPU chime in? |
Any update on this @hbredin 🙏 ? |
No update... hence the |
It has been noticed that the 3.1 pipeline efficiency suffers from speaker embedding inference. With the default config, every 10s chunk has to undergo inference 3 times by the embedding model. It proves effective by separating the whole embedding model pipeline into the resnet backbone and the mask pooling. With this modification, every chunk only needs to be inferred one time through the backbone, bringing almost 3x speedup in my experiment. Furthermore, cache inference strategy helps a lot as well, given the default overlapped ratio of 90%. |
I think that the main problem lies in the pyannote-audio/pyannote/audio/pipelines/speaker_diarization.py Lines 302 to 306 in 6e22f41
It seems like for longer files, the |
@marrrcin these are two different problems. |
Thanks @hbredin , loading into memory really helped - with that, the performance is tolerable and 1h file finishes within a few minutes (<5 mins on GPU). |
Happy that your problem is solved and that you "tolerate" the performance of pyannote (that you use for free, by the way). |
I have tested with "Diarization pipeline v3.0" by using CPU, and also found its latency is less than v3.1 (50s -> 30s) |
Just to chime in with a comparison for CPU between 3.0 and 3.1. No loading the file into memory. The difference is massive for longer files. A 22 minute file on a Ryzen 6850U.
We observed similar long embedding times on M1 and Intel. |
@hbredin Just tried out pyannote 1.2 and empbeddings are much faster again in CPU. Did you change somethin in this regard? Again a 22 minute file on a Ryzen 6850U. |
I did not. But happy that problem is solved. |
maybe it was the torch update... |
1.2 of pyannote? From here? https://pypi.org/project/pyannote.audio/#history |
Should of course have been 3.2 |
Tested versions
Tested on 3.1 vs 3.0
System information
Debian GNU/Linux, torch 2.1.2
Issue description
When running diarization pipeline on CPU, v3.1 is more than 2x slower than v3.0. Is it possible to make it faster?
Minimal reproduction example (MRE)
The text was updated successfully, but these errors were encountered: