Short Title

Build custom Speech to Text model with speaker diarization capabilities

Long Title

Build custom Speech to Text model and transcribe audios to detect multiple speakers from the audio

Author

Manoj Jahgirdar
Manjula Hosurmath

URLs

Github repo

https://github.com/IBM/build-custom-stt-model-with-diarization

Video Link

https://www.youtube.com/watch?v=xgkYRJdBQ8E

Summary

In this code pattern, given a corpus file and audio recordings of a meeting or classroom, we train custom language and acoustic speech to text model to transcribe audios to get speaker diarized output.

Technologies

Python: An open-source interpreted high-level programming language for general-purpose programming.
Object Storage: Store large amounts of data in a highly scalable manner.

Description

One of the features of Watson Speech to Text is the capability to detect different speakers from the audio also known as speaker diarization. In this code pattern, we will showcase the speaker diarization capabilities of Watson Speech to Text by training a custom language model with a corpus text file which will train the model with ‘Out of Vocabulary’ words and a custom acoustic model with the audio files (extracted in the previous code pattern of the series) which will train the model with ‘Accent’ detection, in a python flask runtime.

Refer this link for complete details of the series: https://developer.ibm.com/articles/text-mining-and-analysis-from-webex-recordings

Flow

User uploads corpus file to the application
The extracted audio from the previous code pattern of the series is retrived from Cloud Object Storage
The corpus file as well as the extracted audio are uploaded to Watson Speech To Text to train the custom model
The Downloaded audio file from the previous code pattern of the series is transcribed with the custom Speech To Text model and the text file is stored in Cloud Object Storage

Instructions

Find the detailed steps in the README file.

Clone the repo
Create Watson Speech To Text Service
Add the Credentials to the Application
Deploy the Application
Run the Application

Components and services

Speech to Text: The Speech to Text service converts the human voice into the written word. It can be used anywhere there is a need to bridge the gap between the spoken word and their written form, including voice control of embedded systems, transcription of meetings and conference calls, and dictation of email and notes.
Object Storage: IBM Cloud Object Storage is a highly scalable cloud storage service, designed for high durability, resiliency and security. Store, manage and access your data via our self-service portal and RESTful APIs. Connect applications directly to Cloud Object Storage use other IBM Cloud Services with your data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIKI.md

WIKI.md

Short Title

Long Title

Author

URLs

Github repo

Video Link

Summary

Technologies

Description

Flow

Instructions

Components and services

Files

WIKI.md

Latest commit

History

WIKI.md

File metadata and controls

Short Title

Long Title

Author

URLs

Github repo

Video Link

Summary

Technologies

Description

Flow

Instructions

Components and services