Skip to content

A PyTorch implementation of "Improving noise robust automatic speech recognition with single-channel time-domain enhancement network"

License

Notifications You must be signed in to change notification settings

TeaPoly/conv-tasnet

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Thanks Keisuke Kinoshita for helping me to solve problems.

ConvTasNet

A PyTorch implementation of the TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation and Improving noise robust automatic speech recognition with single-channel time-domain enhancement network

Requirements

see requirements.txt

Usage

./nnet/separate.py /path/to/checkpoint --input /path/to/mix.scp --gpu 0 > separate.log 2>&1 &
  • evaluate
./nnet/compute_si_snr.py /path/to/ref_spk1.scp,/path/to/ref_spk2.scp /path/to/inf_spk1.scp,/path/to/inf_spk2.scp
  • file format

The ".scp" file is kaldi's script file, its content include UUID and file path. Like this:

uuid1 /path/to/file1
uuid2 /path/to/file2

mix.scp: Mixture multiple speaker speech from skp1.scp, skp2.scp ... and spk$N.scp. ...

Reference

Luo Y, Mesgarani N. TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation[J]. arXiv preprint arXiv:1809.07454, 2018.

Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani. Improving noise robust automatic speech recognition with single-channel time-domain enhancement network. arXiv preprint arXiv:2003.03998, 2020.

About

A PyTorch implementation of "Improving noise robust automatic speech recognition with single-channel time-domain enhancement network"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.3%
  • Shell 0.7%