CATALAN TEXT TO SPEECH API

About

This repository contains an api that transform any given text into audio with catalan native pronunciation. It uses a pre-trained model intented to use without GPU. It's the base of this project: Reference, which is a implementation of Tacotron2 developed by Nvidia. The api is protected by JWT and the results are cached in redis.

Prerequisites

Usage

Without gpu

Step 1: build image

docker build -t voice-synth-cat:v1 -f Dockerfile.cpu .

Step 2: Run stack

docker-compose up

Step 3: Register user You can use the pre-registered user skipping to the next step or jump into container "voice-synth-cat:v1" and run users registration script with a new user:

python user_registration.py --user_id 123 --username 'test-user' --password 'fd468069ebfc6cbb848bb673541c18ef979c6f2a2e5998481f2c524f0fb3257a'

Step 4: Ask for your valid token Copy the given "access_token" for the default user:

curl --header "Content-Type: application/json" \
  --request POST \
  --data '{"user": 1, "password":"fd468069ebfc6cbb848bb673541c18ef979c6f2a2e5998481f2c524f0fb3257a"}' \
  http://localhost:8080/token

Or change user and password of your registered one.

Step 5: Ask the api voice to read your text in perfect Catalan

curl --location --request POST 'localhost:8080/' \
--header 'Authorization: Bearer <your token here>' \
--header 'Content-Type: application/json' \
--data-raw '{"text": ["frase de prova"]}'

Bear in mind that each sentence in the request array should be up to 130 characters length. Your audio in the response is encoded as a base64 string.

With NVIDIA gpu

Step 1: Build the image

docker build -t voice-synth-cat-gpu:v1 -f Dockerfile.gpu .

Step 2: Run de containers

docker run -d --network voice_synth_internal --rm redis redis-cli -h redis
docker run --gpus all -p 8080:8080 -v "$(pwd)"/logs:/app/logs --network voice_synth_internal -it --rm --name voice voice-synth-cat-gpu:v1

At this point, the process is the same as the CPU version. You can jump back to step 3 above. You can deploy it to docker swarm too:

docker stack deploy --compose-file docker-stack.yml voice-synth-gpu-stack

Workers

By default the api has one worker for each cpu server. This setup can be overridden by the WORKERS_PER_CORE or WEB_CONCURRENCY environment variables. The latter specify the total number of workers, no matter how many cores the server has.

Logs

Logs can be configured in config/logging.ymlbe and ther're stored in /logs folder.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
auth		auth
config		config
handlers		handlers
logs		logs
redis-volume		redis-volume
routes		routes
text		text
utilities		utilities
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile.cpu		Dockerfile.cpu
Dockerfile.gpu		Dockerfile.gpu
README.md		README.md
audio_processing.py		audio_processing.py
docker-compose.yml		docker-compose.yml
docker-stack.yml		docker-stack.yml
gunicorn_conf.py		gunicorn_conf.py
hparam.py		hparam.py
hparams_synth.py		hparams_synth.py
layers.py		layers.py
model.py		model.py
requirements-server.txt		requirements-server.txt
start.sh		start.sh
stft.py		stft.py
synthesizer.py		synthesizer.py
user_registration.py		user_registration.py
utils_tts.py		utils_tts.py

nanotower/catalan-voice-synthesizer

Folders and files

Latest commit

History

Repository files navigation

CATALAN TEXT TO SPEECH API

About

Prerequisites

Usage

Without gpu

With NVIDIA gpu

Workers

Logs

About

Topics

Resources

Stars

Watchers

Forks

Languages