-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with constant hallucinations #121
Comments
hi, you can check whether it's the same with offline Whisper model with VAD on. Alternatively, just remove that phrase from all transcripts before searching for the longest common prefix. But beware, it won't output it when you actually need it. And it may not work whenever Whisper hallucinates anything else. |
This is not any actual greek phrase. My best guess is that the model was partially trained in greek using community generated subtitles for tv shows and whatnot, and they had the creator's name as an advertisement during moments of silence where actual captioning was not needed. This is using the large-v3 model and I cannot find any model that does greek better than this. Do note that this also shows up when transcribing videos with the base Whisper. For now I am attempting to remove it like this:
Where I check the words retrieved from self.asr.ts_words(res) during process_iter and return early if this is found. Edit:
|
Any help on this matter would be greatly appreciated. |
Hi, I'd like to help but I'm busy now. Small advice: Btw. -- latency measure should be applied as well but can be neglected for start. |
Using the large-v3 model to transcribe greek audio from a live stream, I am often met with continuous results writing "Υπότιτλοι AUTHORWAVE"
It seems the model is bugged in a way that outputs that phrase when it does not understand the input.
Setting vac and vad to True dos not seem to reduce that occurrence.
Is there some way I can discard this specific phrase or similar ones so they do not get confirmed and sent to the client?
The text was updated successfully, but these errors were encountered: