Any way to extract punctuation marks and transcription #42

gormat · 2021-01-25T12:04:08Z

gormat
Jan 25, 2021

Hey.
Really cool project !!!
I am using silero-stt model to extract text.
I just have few questions:

This model does not extract punctuation marks and does not split into sentences, am I missing something?
Is there a way to get start and end timestamps for each word as well?

Thank you.

Answered by snakers4

Jan 25, 2021

Hi @gormat,

This model does not extract punctuation marks and does not split into sentences, am I missing something?

in most of cases written punctuation is not related to how people actually speak (except for maybe ?)
in the CE version of our models we just provide the STT part
but typically the pipeline may work as follows:

you can use some VAD to split speech into utterances. typically people separate ideas with pauses
you can tune some pre-trained LM like a transformer to add capital letters, commas, full-stops etc

Is there a way to get start and end timestamps for each word as well?

Please use this colab
Proceed to PyTorch example
Search for comment # align example

View full answer

snakers4 · 2021-01-25T12:14:07Z

snakers4
Jan 25, 2021
Maintainer

Hi @gormat,

This model does not extract punctuation marks and does not split into sentences, am I missing something?

in most of cases written punctuation is not related to how people actually speak (except for maybe ?)
in the CE version of our models we just provide the STT part
but typically the pipeline may work as follows:

you can use some VAD to split speech into utterances. typically people separate ideas with pauses
you can tune some pre-trained LM like a transformer to add capital letters, commas, full-stops etc

Is there a way to get start and end timestamps for each word as well?

Please use this colab
Proceed to PyTorch example
Search for comment # align example

0 replies

snakers4 · 2021-12-09T16:55:19Z

snakers4
Dec 9, 2021
Maintainer

you can tune some pre-trained LM like a transformer to add capital letters, commas, full-stops etc

We actually published models for this

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Any way to extract punctuation marks and transcription #42

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Any way to extract punctuation marks and transcription #42

Uh oh!

Uh oh!

gormat Jan 25, 2021

Replies: 2 comments

Uh oh!

Uh oh!

snakers4 Jan 25, 2021 Maintainer

Uh oh!

snakers4 Dec 9, 2021 Maintainer

gormat
Jan 25, 2021

snakers4
Jan 25, 2021
Maintainer

snakers4
Dec 9, 2021
Maintainer