Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: sync latest README #509

Merged
merged 3 commits into from
Jul 2, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 123 additions & 46 deletions docs/cn/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@
一款适用于日常对话的生成式语音模型。

[![Licence](https://img.shields.io/badge/LICENSE-CC%20BY--NC%204.0-green.svg?style=for-the-badge)](https://github.com/2noise/ChatTTS/blob/main/LICENSE)
[![PyPI](https://img.shields.io/pypi/v/ChatTTS.svg?style=for-the-badge)](https://pypi.org/project/ChatTTS)

[![Huggingface](https://img.shields.io/badge/🤗%20-Models-yellow.svg?style=for-the-badge)](https://huggingface.co/2Noise/ChatTTS)
[![Open In Colab](https://img.shields.io/badge/Colab-F9AB00?style=for-the-badge&logo=googlecolab&color=525252)](https://colab.research.google.com/github/2noise/ChatTTS/blob/main/examples/ipynb/colab.ipynb)
[![Discord](https://img.shields.io/badge/Discord-7289DA?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/Ud5Jxgx5yD)

[**English**](../../README.md) | **简体中文** | [**日本語**](../jp/README.md) | [**Русский**](../ru/README.md) | [**Español**](../es/README.md)

Expand All @@ -19,6 +21,12 @@

## 简介

> [!Note]
> 这个仓库包含算法架构和一些简单的示例。

> [!Tip]
> 由本仓库衍生出的用户端产品,请参见由社区维护的索引仓库 [Awesome-ChatTTS](https://github.com/libukai/Awesome-ChatTTS)。

ChatTTS 是一款专门为对话场景(例如 LLM 助手)设计的文本转语音模型。

### 支持的语种
Expand All @@ -42,11 +50,10 @@ ChatTTS 是一款专门为对话场景(例如 LLM 助手)设计的文本转

### 路线图

- [x] 开源 4 万小时基础模型和 spk_stats 文件
- [ ] 开源 VQ 编码器和 Lora 训练代码
- [ ] 无需细化文本即可进行流式音频生成
- [ ] 开源具有多情感控制功能的 4 万小时版本
- [ ] 也许会有 ChatTTS.cpp ?(欢迎 PR 或新建仓库)
- [x] 开源 4 万小时基础模型和 spk_stats 文件。
- [x] 支持流式语音输出。
- [ ] 开源具有多情感控制功能的 4 万小时版本。
- [ ] ChatTTS.cpp (欢迎在 2noise 组织中新建仓库)。

### 免责声明

Expand All @@ -71,66 +78,112 @@ ChatTTS 是一款强大的文本转语音系统。但是,负责任和道德地

- **群 1**, 808364215 (已满)
- **群 2**, 230696694 (已满)
- **群 3**, 933639842
- **群 3**, 933639842 (已满)
- **群 4**, 608667975

## 安装教程 (丰富中)
##### 2. Discord

> 将在近期上传至 pypi,详情请查看 https://github.com/2noise/ChatTTS/issues/269 上的讨论
点击加入 [Discord](https://discord.gg/Ud5Jxgx5yD)

#### 1. 使用源代码安装
## 体验教程

### 克隆仓库

```bash
pip install git+https://github.com/2noise/ChatTTS
git clone https://github.com/2noise/ChatTTS
cd ChatTTS
```

### 安装依赖

#### 1. 直接安装

```bash
pip install --upgrade -r requirements.txt
```

#### 2. 使用 conda 安装

```bash
git clone https://github.com/2noise/ChatTTS
cd ChatTTS
conda create -n chattts
conda activate chattts
pip install -r requirements.txt
```

## 使用教程
#### 可选 : 如果使用 NVIDIA GPU(仅限 Linux),可安装 TransformerEngine。

### 安装依赖
> [!Note]
> 安装过程可能耗时很长。

> [!Warning]
> TransformerEngine 的适配目前正在开发中,运行时可能会遇到较多问题。仅推荐出于开发目的安装。

```bash
pip install --upgrade -r requirements.txt
pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
```

### 快速开始
#### 可选 : 安装 FlashAttention-2 (主要适用于 NVIDIA GPU)

#### 1. 启动 WebUI
> [!Note]
> 支持设备列表详见 [Hugging Face Doc](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2).

```bash
pip install flash-attn --no-build-isolation
```

### 快速启动

> 确保在执行以下命令时,处于项目根目录下。

#### 1. WebUI 可视化界面

```bash
python examples/web/webui.py
```

#### 2. 使用命令行
#### 2. 命令行交互

> 生成的音频将保存至 `./output_audio_n.mp3`

```bash
python examples/cmd/run.py "Your text 1." "Your text 2."
```

## 开发教程

### 安装 Python 包

1. 从 PyPI 安装稳定版

```bash
pip install ChatTTS
```

2. 从 GitHub 安装最新版

```bash
pip install git+https://github.com/2noise/ChatTTS
```

> 生成的音频将保存至 `./output_audio_xxx.wav`
3. 从本地文件夹安装开发版

```bash
python examples/cmd/run.py "Please input your text."
pip install -e .
```

### 基础用法

```python
import ChatTTS
from IPython.display import Audio
import torchaudio
import torch
import torchaudio

chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance

texts = ["PUT YOUR TEXT HERE",]
texts = ["PUT YOUR 1st TEXT HERE", "PUT YOUR 2nd TEXT HERE"]

wavs = chat.infer(texts, )
wavs = chat.infer(texts)

torchaudio.save("output1.wav", torch.from_numpy(wavs[0]), 24000)
```
Expand All @@ -144,26 +197,31 @@ torchaudio.save("output1.wav", torch.from_numpy(wavs[0]), 24000)
rand_spk = chat.sample_random_speaker()
print(rand_spk) # save it for later timbre recovery

params_infer_code = {
'spk_emb': rand_spk, # add sampled speaker
'temperature': .3, # using custom temperature
'top_P': 0.7, # top P decode
'top_K': 20, # top K decode
}
params_infer_code = ChatTTS.Chat.InferCodeParams(
spk_emb = rand_spk, # add sampled speaker
temperature = .3, # using custom temperature
top_P = 0.7, # top P decode
top_K = 20, # top K decode
)

###################################
# For sentence level manual control.

# use oral_(0-9), laugh_(0-2), break_(0-7)
# to generate special token in text to synthesize.
params_refine_text = {
'prompt': '[oral_2][laugh_0][break_6]'
}
params_refine_text = ChatTTS.Chat.RefineTextParams(
prompt='[oral_2][laugh_0][break_6]',
)

wavs = chat.infer(texts, params_refine_text=params_refine_text, params_infer_code=params_infer_code)
wavs = chat.infer(
texts,
params_refine_text=params_refine_text,
params_infer_code=params_infer_code,
)

###################################
# For word level manual control.

text = 'What is [uv_break]your favorite english food?[laugh][lbreak]'
wavs = chat.infer(text, skip_refine_text=True, params_refine_text=params_refine_text, params_infer_code=params_infer_code)
torchaudio.save("output2.wav", torch.from_numpy(wavs[0]), 24000)
Expand All @@ -174,26 +232,49 @@ torchaudio.save("output2.wav", torch.from_numpy(wavs[0]), 24000)

```python
inputs_en = """
chat T T S is a text to speech model designed for dialogue applications.
[uv_break]it supports mixed language input [uv_break]and offers multi speaker
capabilities with precise control over prosodic elements [laugh]like like
[uv_break]laughter[laugh], [uv_break]pauses, [uv_break]and intonation.
chatTTS is a text to speech model designed for dialogue applications.
[uv_break]it supports mixed language input [uv_break]and offers multi speaker
capabilities with precise control over prosodic elements like
[uv_break]laughter[uv_break][laugh], [uv_break]pauses, [uv_break]and intonation.
[uv_break]it delivers natural and expressive speech,[uv_break]so please
[uv_break] use the project responsibly at your own risk.[uv_break]
""".replace('\n', '') # English is still experimental.

params_refine_text = {
'prompt': '[oral_2][laugh_0][break_4]'
}
# audio_array_cn = chat.infer(inputs_cn, params_refine_text=params_refine_text)
params_refine_text = ChatTTS.Chat.RefineTextParams(
prompt='[oral_2][laugh_0][break_4]',
)

audio_array_en = chat.infer(inputs_en, params_refine_text=params_refine_text)
torchaudio.save("output3.wav", torch.from_numpy(audio_array_en[0]), 24000)
```

<table>
<tr>
<td align="center">

**男性音色**

</td>
<td align="center">

**女性音色**

</td>
</tr>
<tr>
<td align="center">

[男性音色](https://github.com/2noise/ChatTTS/assets/130631963/e0f51251-db7f-4d39-a0e9-3e095bb65de1)

</td>
<td align="center">

[女性音色](https://github.com/2noise/ChatTTS/assets/130631963/f5dcdd01-1091-47c5-8241-c4f6aaaa8bbd)

</td>
</tr>
</table>

</details>

## 常见问题
Expand All @@ -220,10 +301,6 @@ torchaudio.save("output3.wav", torch.from_numpy(audio_array_en[0]), 24000)

- [wlu-audio lab](https://audio.westlake.edu.cn/) 对于早期算法实验的支持。

## 相关资源

- [Awesome-ChatTTS](https://github.com/libukai/Awesome-ChatTTS) 一个 ChatTTS 的资源汇总列表。

## 贡献者列表

[![contributors](https://contrib.rocks/image?repo=2noise/ChatTTS)](https://github.com/2noise/ChatTTS/graphs/contributors)
Expand Down
Loading