Deep Speech with CTC Loss

Introduction

Deep Speech model is one of the ASR that got the SOTA in Speech Recognition domain. In this respository, I use Deep Speech with Vivos Dataset and Vin BigData VLSP 2020 Dataset.

How to use this respository

Clone this project to current directory. Using those commands:

!git init
!git remote add origin https://github.com/tuanio/deepspeech-ctc
!git pull origin main

Install requirement packages

!pip install -r requirements.txt

Then install ctcdecode from this respository: https://github.com/parlance/ctcdecode

Edit configs.yaml file for appropriation.
Train model using python main.py -cp conf -cn configs

Run the Web Demo version

streamlit run web.py

Train results

Train loss of Deep Speech on 978 epochs

Validation loss of Deep Speech

Validation word error rate (mean wer) of Deep Speech

Note

sox is audio backend for linux, PySoundFile is audio backend for windows

Environment variable

HYDRA_FULL_ERROR=1

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
conf		conf
.gitignore		.gitignore
README.md		README.md
datamodule.py		datamodule.py
datasets.py		datasets.py
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
utils.py		utils.py
vocab.json		vocab.json
web.py		web.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Speech with CTC Loss

Introduction

How to use this respository

Run the Web Demo version

Train results

Note

Environment variable

About

Releases 1

Packages

Languages

tuanio/deepspeech-ctc

Folders and files

Latest commit

History

Repository files navigation

Deep Speech with CTC Loss

Introduction

How to use this respository

Run the Web Demo version

Train results

Note

Environment variable

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages