Chinese-FastSpeech2

基于标贝中文标准女声数据继续训练，同时对原论文的FastSpeech2模型做了改进，引入了韵律表征以及韵律预测模块，使中文发音更生动且富有节奏

20230402 更新

参考samples中生成的音频

本项目主体架构为FastSpeech2+HifiGAN结构，另外在输入阶段引入了中文文本的韵律向量，因此共有三个模型：fastspeech_model、hifigan_model、prosody_model（网盘链接，提取码：qgpi），下载后将模型文件放入指定的目录下：

提供了两种预测方式：1）python synthesize_all.py；2）http接口调用

第一种方式是交互式，命令行运行python synthesize_all.py后，输入需要转换的文本，运行后会在代码会在当前工作目录下生成tmp.wav文件；
第二种方式是api调用，运行tts_server.py，会启动语音转文本的接口，调用该接口可参考TestServer.py，同样生成的音频文件(tmp.wav)会保存在当前工作目录下

本项目是出于个人兴趣在语音合成方面做的一些尝试，欢迎大家批评指正，多多交流！

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
BertProsody		BertProsody
audio		audio
config/AISHELL3		config/AISHELL3
hifigan		hifigan
lexicon		lexicon
model		model
output/ckpt/biaobei		output/ckpt/biaobei
preprocessed_data/biaobei		preprocessed_data/biaobei
preprocessor		preprocessor
samples		samples
text		text
transformer		transformer
utils		utils
README.md		README.md
TestServer.py		TestServer.py
dataset.py		dataset.py
evaluate.py		evaluate.py
preprocessor.py		preprocessor.py
requirements.txt		requirements.txt
synthesize_all.py		synthesize_all.py
text_normalization.py		text_normalization.py
train.py		train.py
tts_server.py		tts_server.py