dc_tts-transfer-learning

This repo contains attempts to apply transfer learning to the dc_tts text-to-speech model decribed in the paper Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. The code used is a modified version of Kyubyong's dc_tts code. The pretrained model was also provided in Kyubong's repo. It was pretrained on the LJ Speech Dataset. Scarlett Johansson's voice was trained during transfer learning

Transfer Learning is accomplished by selecting the model layers to train in hyperparameters.py

Task List:

Prelim Model Training

~6 hrs of training on Tesla V100 GPU
Layers trained:
- SSRN(C_13, C_14, C_15, C_16)
- Text2Mel/TextEnc(HC_11, HC_12, HC_13, HC_14, HC_15)
- Text2Mel/AudioEnc(HC_9, HC_10, HC_11, HC_12, HC_13)
- Text2Mel/AudioDec(HC_7, C_8, C_9, C_10, C_11)

Scarlett Johansson's audio book

references:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
data_load.py		data_load.py
hyperparams.py		hyperparams.py
modules.py		modules.py
networks.py		networks.py
prepo.py		prepo.py
synth_dctts.ipynb		synth_dctts.ipynb
synthesize.py		synthesize.py
train_transfer.py		train_transfer.py
tvars_ssrn.csv		tvars_ssrn.csv
tvars_text2mel.csv		tvars_text2mel.csv
utils.py		utils.py