light-tts

light-tts is a lightweight and high-performance text-to-speech (TTS) inference and service framework based on Python. It is built around the cosyvoice model and based on the lightllm, with optimizations to support fast, scalable, and service-ready TTS deployment.

✨ Features

🚀 Optimized LLM Inference: The language model part of the TTS pipeline is accelerated using techniques from lightllm and supports high-throughput batch inference
🧩 Shared Memory Timbre Manager with LRU: Manages speaker/timbre embeddings in shared memory for fast access and minimal recomputation
🧱 Modular Architecture (Encode–LLM–Decode): Refactored from LightLLM into three decoupled modules—Encoder, LLM, and Decoder—each running as separate processes for efficient task parallelism and scalability.
🌐 Service Ready and Easy Integration: Comes with an HTTP API for fast deployment and simple APIs for integration into other Python or web projects
🔄 Bi-streaming Mode via WebSocket: Supports interactive bi-directional streaming using WebSocket for low-latency, real-time TTS communication

⚡️ Get Started

Installation

Installing with Docker

# The easiest way to install Lightllm is by using the official image. You can directly pull and run the official image
docker pull lighttts/light-tts:v1.0

# Or you can manually build the image
docker build -t light-tts:v1.0 .

# Run the image
docker run -it --gpus all -p 8080:8080 --shm-size 4g -v your_local_path:/data/ light-tts:v1.0 /bin/bash

Installing from Source

# Clone the repo
git clone --recursive https://github.com/ModelTC/light-tts.git
cd light-tts
# If you failed to clone the submodule due to network failures, please run the following command until success
# cd light-tts
# git submodule update --init --recursive

# (Recommended) Create a new conda environment
conda create -n light-tts python=3.10 -y
conda activate light-tts

# pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platforms.
conda install -y -c conda-forge pynini==2.1.5
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com

# If you encounter sox compatibility issues
# ubuntu
sudo apt-get install sox libsox-dev
# centos
sudo yum install sox sox-devel

Model download

We now only support CosyVoice2 model.

# SDK模型下载
from modelscope import snapshot_download
snapshot_download('iic/CosyVoice2-0.5B', local_dir='pretrained_models/CosyVoice2-0.5B')
snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')

# git模型下载，请确保已安装git lfs
mkdir -p pretrained_models
git clone https://www.modelscope.cn/iic/CosyVoice2-0.5B.git pretrained_models/CosyVoice2-0.5B
git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git pretrained_models/CosyVoice-ttsfrd

For better text normalization performance, you can optionally install the ttsfrd package and unzip its resources. This step is not required — if skipped, the system will fall back to WeTextProcessing by default.

cd pretrained_models/CosyVoice-ttsfrd/
unzip resource.zip -d .
pip install ttsfrd_dependency-0.1-py3-none-any.whl
pip install ttsfrd-0.4.2-cp310-cp310-linux_x86_64.whl

📝 This setup instruction is based on the original guide from the CosyVoice repository.

Start the Model Service

# It is recommended to enable the load_trt parameter for acceleration.
# The default is fp16 mode.
python -m light_tts.server.api_server --model_dir ./pretrained_models/CosyVoice2-0.5B-latest --load_trt True --max_total_token_num 65536 --max_req_total_len 32768

max_total_token_num: llm arg, the total token nums the gpu and model can support, equals = max_batch * (input_len + output_len)
max_req_total_len: llm arg, the max value for req_input_len + req_output_len, 32768 is set here because the max_position_embeddings of the llm part is 32768
There are many other parameters that can be viewed in light_tts/server/api_cli.py

Wait for a while, this service will be started. The default startup is localhost:8080.

Request Examples

When your service is started, you can call the service through the http API. We support three modes: non-streaming, streaming and bi-streaming.

non-streaming and streaming. You can also use test/test_zero_shot.py, which can print information such as rtf and ttft.

import requests
import time
import soundfile as sf
import numpy as np
import os
import threading
import json

url = "http://localhost:8080/inference_zero_shot"
path = "cosyvoice/asset/zero_shot_prompt.wav" # wav file path
prompt_text = "希望你以后能够做的比我还好呦。"
tts_text = "收到好友从远方寄来的生日礼物，那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐，笑容如花儿般绽放。"
stream = True  # Whether to use streaming inference
files = {
    "prompt_wav": ("sample.wav", open(path, "rb"), "audio/wav")
}
data = {
    "tts_text": tts_text,
    "prompt_text": prompt_text,
    "stream": stream
}
response = requests.post(url, files=files, data=data, stream=True)
sample_rate = 24000

audio_data = bytearray()
try:
    for chunk in response.iter_content(chunk_size=4096):
        if chunk:
            audio_data.extend(chunk)
except Exception as e:
    print(f"Exception: {e}")
    print(f"Error: {response.status_code}, {response.text}")
    return
audio_np = np.frombuffer(audio_data, dtype=np.int16)
if response.status_code == 200:
    output_wav = f"./outs/output{'_stream' if stream else ''}_{index}.wav"
    sf.write(output_wav, audio_np, samplerate=sample_rate, subtype="PCM_16")
    print(f"saved as {output_wav}")
else:
    print("Error:", response.status_code, response.text)

bi-streaming. We use the websocket interface implementation, and we can find usage examples in test/test_bistream.py.

License

This repository is released under the Apache-2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
asset		asset
cosyvoice @ c939c80		cosyvoice @ c939c80
light_tts		light_tts
test		test
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
launcher.sh		launcher.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

light-tts

✨ Features

⚡️ Get Started

Installation

Model download

Start the Model Service

Request Examples

License

About

Uh oh!

Releases

Packages

Languages

License

ModelTC/light-tts

Folders and files

Latest commit

History

Repository files navigation

light-tts

✨ Features

⚡️ Get Started

Installation

Model download

Start the Model Service

Request Examples

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages