Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do whisper CT2(base model) achieve same speed as that of vosk (english large) with CPU #1

Open
skillhacker-code opened this issue Mar 3, 2023 · 4 comments

Comments

@skillhacker-code
Copy link

Do whisper CT2(base model) achieve same speed as that of vosk (english large) on cpu only

@fquirin
Copy link
Owner

fquirin commented Mar 3, 2023

Its a bit tricky to answer, because Vosk has a real streaming mode with partial results, meaning you don't have to wait until the user has finished speaking, but only have to transcribe the last chunk of audio left while Whisper basically starts transcribing AFTER the user finished.
So the short answer is: the longer you speak the faster Vosk will be.

I haven't compared Whisper to Vosk in non-streaming mode yet. Maybe I'll add some tests for that.

@skillhacker-code
Copy link
Author

Thank you for creating this comparison . because of this i tried out the faster whisper and It is faster than whisper cpp .

@fquirin
Copy link
Owner

fquirin commented Mar 3, 2023

It is indeed, at least on ARM CPUs. You can follow the discussion about it here: ggerganov/whisper.cpp#7 (comment)

It seems to be some optimization issue on ARM. Results on X86 (Intel/AMD) CPUs might show a different result and catch up to the CT2 version.

@fquirin
Copy link
Owner

fquirin commented Mar 13, 2023

Hi @nyadla-sys , I wrote you on Twitter via SEPIA account 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants