-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect common transcript errors #7
Comments
Well, it ain't fancy, but this would identify the presence of often-repeating lines:
|
Identifies duplicate lines. Toward #7.
Now figure out what to do with these, once detected. My best guess is this:
|
OK, they're now stored in a file, and used to prevent adding new references to the JSON. I think I should just manually prune existing JSON references. It won't be possible to properly close this issue until I understand the source of this problem (why is Whisper doing this?) and how to fix it. |
I'm seeing Whisper do goofy, incorrect stuff sometimes.
One is the insertion of earlier phrases later on in transcripts, sprinkled through the transcript.
Another is repeating a phrase dozens of times:
Identify a lightweight way to identify these artifacts in transcripts and flag them for review or re-processing.
The text was updated successfully, but these errors were encountered: