MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module, on Arxiv.
Accepted to Speech Synthesis Workshop 12, 2023, Grenoble
Presentation slides
MooseNet is a trainable metric for synthesized speech. We experimented with SSL NN models and PLDA module. See the MooseNet-PLDA paper.
# Optional for reinstallation
conda deactivate; rm -rf env;
# Installing new conda environment and editable pip moosenet package
conda env create --prefix ./env -f environment.yml \
&& conda activate ./env \
&& pip install -e .[dev]
- The commands for fine-tuning a SSL models (XLS-R and Wav2Vec 2.0) to MooseNet NN on the English data from the main track can be found in
./main.sh
- For the commands for fine-tuning MooseNet NN on main and the Chinese set from OOD track see
./ood.sh
This work was co-funded by Charles University projects GAUK 40222, SVV 260575 and the European Union (ERC, NG-NLG, 101039303).