This repository comprises a compilation of the objective metrics used in several text-to-speech (TTS) papers.
Metric | Used In |
---|---|
Voicing Decision Error (VDE) | E2E-Prosody, Mellotron |
Gross Pitch Error (GPE) | E2E-Prosody, Mellotron |
F0 Frame Error (FFE) | E2E-Prosody, Mellotron |
Dynamic Time Warping (DTW) | FastSpeech2 |
Mel Spectral Distortion (MSD) | Wave-Tacotron |
Mel Cepstral Distortion (MCD) | E2E-Prosody, Wave-Tacotron |
Statistical Moments (STD, SKEW, KURT) | FastSpeech2 |
Alogrithm | Proposed In |
---|---|
YIN | (Cheveigné and Kawahara, 2002) |
DIO | (Morise, Kawahara, and Katayose, 2009) |
PYIN (Testing) | (Mauch and Dixon, 2014) |
First, clone and enter the repo:
git clone https://github.com/AI-Unicamp/TTS-Objective-Metrics
cd TTS-Objective-Metrics
Install dependencies:
pip install -r requirements.txt
Run
python evaluate.py
evaluate.py is hardcoded to call driver.txt, driver.txt has one test instance per row in the following format source_transcript source_wav_path target_transcript target_wav_path
📦TTS Objective Metrics
┣ 📂audio
┃ ┣ 📜helpers.py
┃ ┣ 📜pitch.py
┃ ┣ 📜visuals.py
┣ 📂bin\ # this is legacy code
┃ ┣ 📜compute_metrics.py
┣ 📂config
┃ ┣ 📜global_config.py
┣ 📂examples
┣ 📂metrics
┣ ┣ 📂sources
┣ ┣ ┣sources.wav
┣ ┣ 📂targets
┣ ┣ ┣targets.wav.wav
┃ ┣ 📜dists.py
┃ ┣ 📜DTW.py
┃ ┣ 📜FFE.py
┃ ┣ 📜GPE.py
┃ ┣ 📜helpers.py
┃ ┣ 📜MCD.py
┃ ┣ 📜moments.py
┃ ┣ 📜MSD.py
┃ ┣ 📜VDE.py
┃ ┣ 📜WER.py
┃ ┣ 📜SECS.py
┣ 📜README.md
┣ 📜evaluate.py
┣ 📜driver.txt
As the repo is still in its infancy, feel free to either open an issue, discussion or send a pull request, or even contact us by e-mail.
- Leonardo B. de M. M. Marques ([email protected])
- Lucas Hideki Ueda ([email protected])
All references are listened on top of the used code itself.