Period token mostly missing from text enhancer model output #107
-
Hi, I fed this small audio to an STT engine and obtained following transcription:
Feeding this as is to text enhancer model in
You can see it misses almost all the periods. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Hi, We described these limitations in the accompanying article - https://habr.com/ru/post/581960/:
Support for paragraphs consisting of several sentences will be added in next version. |
Beta Was this translation helpful? Give feedback.
-
Also for the best result for now, you can use our VAD (https://github.com/snakers4/silero-vad) to split audio into chunks. In this case you can treat each chunk as a sentence. |
Beta Was this translation helpful? Give feedback.
-
@abhinavkulkarni Input:
Output:
Looks like there is some domain mismatch, the speech is clearly oral, but it kind of works. |
Beta Was this translation helpful? Give feedback.
@abhinavkulkarni
We have just published a model that can process long inputs.
Input: