FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

In this paper, we adopt the end-to-end framework of VITS for high-quality waveform reconstruction, and propose strategies for clean content information extraction without text annotation. We disentangle content information by imposing an information bottleneck to WavLM features, and propose the spectrogram-resize based data augmentation to improve the purity of extracted content information.

🤗 Play online at HuggingFace Spaces.

Visit our demo page for audio samples.

We also provide the pretrained models.

(a) Training	(b) Inference

Updates

Code release. (Nov 27, 2022)
Online demo at HuggingFace Spaces. (Dec 14, 2022)
Supports 24kHz outputs. See here for details. (Dec 15, 2022)
Fix data loading bug. (Jan 10, 2023)

Pre-requisites

Clone this repo: git clone https://github.com/OlaWod/FreeVC.git
CD into this repo: cd FreeVC
Install python requirements: pip install -r requirements.txt
Download WavLM-Large and put it under directory 'wavlm/'
Download the VCTK dataset (for training only)
Download HiFi-GAN model and put it under directory 'hifigan/' (for training with SR only)

Inference Example

Download the pretrained checkpoints and run:

# inference with FreeVC
CUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc.json --ptfile checkpoints/freevc.pth --txtpath convert.txt --outdir outputs/freevc

# inference with FreeVC-s
CUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc-s.json --ptfile checkpoints/freevc-s.pth --txtpath convert.txt --outdir outputs/freevc-s

Training Example

Preprocess

python downsample.py --in_dir </path/to/VCTK/wavs>
ln -s dataset/vctk-16k DUMMY

# run this if you want a different train-val-test split
python preprocess_flist.py

# run this if you want to use pretrained speaker encoder
CUDA_VISIBLE_DEVICES=0 python preprocess_spk.py

# run this if you want to train without SR-based augmentation
CUDA_VISIBLE_DEVICES=0 python preprocess_ssl.py

# run these if you want to train with SR-based augmentation
CUDA_VISIBLE_DEVICES=1 python preprocess_sr.py --min 68 --max 72
CUDA_VISIBLE_DEVICES=1 python preprocess_sr.py --min 73 --max 76
CUDA_VISIBLE_DEVICES=2 python preprocess_sr.py --min 77 --max 80
CUDA_VISIBLE_DEVICES=2 python preprocess_sr.py --min 81 --max 84
CUDA_VISIBLE_DEVICES=3 python preprocess_sr.py --min 85 --max 88
CUDA_VISIBLE_DEVICES=3 python preprocess_sr.py --min 89 --max 92

Train

# train freevc
CUDA_VISIBLE_DEVICES=0 python train.py -c configs/freevc.json -m freevc

# train freevc-s
CUDA_VISIBLE_DEVICES=2 python train.py -c configs/freevc-s.json -m freevc-s

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
configs		configs
filelists		filelists
hifigan		hifigan
resources		resources
speaker_encoder		speaker_encoder
tips-for-synthesizing-24KHz-wavs-from-16kHz-wavs		tips-for-synthesizing-24KHz-wavs-from-16kHz-wavs
wavlm		wavlm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
commons.py		commons.py
convert.py		convert.py
convert.txt		convert.txt
data_utils.py		data_utils.py
downsample.py		downsample.py
losses.py		losses.py
mel_processing.py		mel_processing.py
models.py		models.py
modules.py		modules.py
preprocess_flist.py		preprocess_flist.py
preprocess_spk.py		preprocess_spk.py
preprocess_sr.py		preprocess_sr.py
preprocess_ssl.py		preprocess_ssl.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

Updates

Pre-requisites

Inference Example

Training Example

References

About

Releases

Packages

Languages

License

lorgu/FreeVC

Folders and files

Latest commit

History

Repository files navigation

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

Updates

Pre-requisites

Inference Example

Training Example

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages