A voice chatbot utilizing speech recognition, speech synthesis, and artificial intelligence through the Google Gemini API to interact with users.
- Recording of user input audio.
- Speech recognition to convert user input into text.
- Interaction with an artificial intelligence model to generate responses.
- Speech synthesis to transform chatbot responses into audio.
- Python 3.x
- Python libraries listed in
requirements.txt
- Valid credentials for the Gemini API
- Clone the repository:
git clone https://github.com/anacletu/virtual_tandem
- Install dependencies:
pip install -r requirements.txt
- Create a
.env
file in the project root and add your API key, endpoint, and audio preferences:
API_KEY=your_api_key
API_URL=https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent
FS=44100
DURATION=15
Run the main script:
python main.py
Follow the instructions in the terminal to interact with the chatbot. The script will make a request to the API, convert the response into speech, and play the audio.
See a quick video demonstrating simple conversations in Portuguese, English, and Spanish.
Tandembot.mp4
- Addition of support for more languages.
- Implementation of a graphical interface to facilitate interaction.
- Improvement of speech recognition and speech synthesis robustness.
- More configuration possibilities, such as language level and response complexity.
Contributions are welcome! For suggestions, bug fixes, and other changes, feel free to open an issue or submit a pull request.
This project is licensed under the MIT License. See the license file for more details.