This project is a web service built using Flask, Python, Nginx, and Docker, aimed at converting voice input to text using trained models.
based on [1] by Valerio Velardo.
voice2text/
│
├── flask/
│ ├── paraphraser.h5/
│ ├── processor.h5/
│ ├── voice2text.h5/
│ ├── app.ini
│ ├── Dockerfile
│ ├── requirements.txt
│ ├── server.py
│ ├── voice2text_service.py
│
├── model/
│ ├── model.py
│
├── nginx/
│ ├── Dockerfile
│ ├── nginx.conf
│
├── resources/
│ ├── down.wav
│ ├── I had some free time, so I wandered around town.mp3
│ ├── People living in town don't know the pleasures of country life.mp3
│
├── client.py
└── docker-compose.yml
Convert audio to text by calling this service. Ability to paraphrase the transcribed text using chatgpt paraphraser.
The service utilizes pre-trained models for voice-to-text conversion. These models are based on the Hugging Face Transformers library and include:
- Whisper for voice-to-text transformation
- ChatGPT Paraphraser for paraphrasing
For details on the models' architecture and training, refer to the respective model documentation.
-
Clone the repository:
git clone https://github.com/AdamLauz/voice2text.git cd voice2text
-
Install Docker: Follow the official Docker installation guide for your operating system.
-
Build Docker images and start containers:
docker-compose up --build
Once the Docker containers are running:
- Use the provided client.py script to send audio files for transcription. The service supports both WAV and MP3 formats.
python client.py
- Check the console output for the transcribed text.
The API endpoint for transcribing audio files is:
http://localhost:80/predict
The web service architecture consists of three main components:
- Flask Server: Handles incoming HTTP requests, processes audio files, and invokes the voice-to-text service.
- Voice-to-Text Service: Utilizes trained models to convert audio input into text.
- Nginx Reverse Proxy: Routes requests from clients to the Flask server.
When a client sends an audio file to the /predict endpoint, the Flask server:
- Saves the audio file locally.
- Invokes the voice-to-text service to generate a transcription.
- Sends back the transcribed text as a JSON response.
Below is an example of how to call the service using cURL:
curl -X POST -F "file=@/path/to/your/audio/file.mp3" http://localhost:80/predict
Replace /path/to/your/audio/file.mp3 with the path to your actual audio file in MP3 format.
- The Flask server is exposed on port 900 within the Docker network.
- Nginx is used as a reverse proxy to route requests to the Flask server.
- Make sure to provide audio files in WAV or MP3 format for transcription.
- If you encounter any issues, check the Docker logs for the Flask and Nginx containers.
- Ensure that the required dependencies are installed as specified in requirements.txt.
[1] Valerio Velardo. https://github.com/musikalkemist/Deep-Learning-Audio-Application-From-Design-to-Deployment