Voice conversion model for real-time synthesis using PPG (Phonetic PosteriorGram) as an intermediate feature, written in Pytorch.
Transcript: "二階から (n i k a i k a r a) ..."
The correspondence between index and phone is described here.
Baseline samples (No GAN, No DAT)
https://drive.google.com/drive/folders/1Djq4dwZgJdGy4rFVArZY_kLySoxu9iSj?usp=sharing
gen_[ID].wav
: generated speechref_[ID].wav
: source speechjsut_target.wav
: speech from target speaker