Skip to content

tuanio/deepspeech-ctc

Repository files navigation

Deep Speech with CTC Loss

Introduction

Deep Speech model is one of the ASR that got the SOTA in Speech Recognition domain. In this respository, I use Deep Speech with Vivos Dataset and Vin BigData VLSP 2020 Dataset.

How to use this respository

  1. Clone this project to current directory. Using those commands:
!git init
!git remote add origin https://github.com/tuanio/deepspeech-ctc
!git pull origin main
  1. Install requirement packages
!pip install -r requirements.txt

Then install ctcdecode from this respository: https://github.com/parlance/ctcdecode

  1. Edit configs.yaml file for appropriation.
  2. Train model using python main.py -cp conf -cn configs

Run the Web Demo version

  • streamlit run web.py

Train results

train_loss
Train loss of Deep Speech on 978 epochs


validation_loss
Validation loss of Deep Speech


validation_wer
Validation word error rate (mean wer) of Deep Speech

Note

  • sox is audio backend for linux, PySoundFile is audio backend for windows

Environment variable

  • HYDRA_FULL_ERROR=1