Skip to content

A chatbot powered by the Google Text-to-Speech API, seamlessly converting Gemini's responses into natural speech.

License

Notifications You must be signed in to change notification settings

anacletu/tandembot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to Tandembot!

A voice chatbot utilizing speech recognition, speech synthesis, and artificial intelligence through the Google Gemini API to interact with users.

Features

  • Recording of user input audio.
  • Speech recognition to convert user input into text.
  • Interaction with an artificial intelligence model to generate responses.
  • Speech synthesis to transform chatbot responses into audio.

Prerequisites

  • Python 3.x
  • Python libraries listed in requirements.txt
  • Valid credentials for the Gemini API

Installation

  1. Clone the repository:
git clone https://github.com/anacletu/virtual_tandem
  1. Install dependencies:
pip install -r requirements.txt
  1. Create a .env file in the project root and add your API key, endpoint, and audio preferences:
API_KEY=your_api_key
API_URL=https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent
FS=44100
DURATION=15

Usage

Run the main script:

python main.py

Follow the instructions in the terminal to interact with the chatbot. The script will make a request to the API, convert the response into speech, and play the audio.

Example Usage

See a quick video demonstrating simple conversations in Portuguese, English, and Spanish.

Tandembot.mp4

Future Improvements

  • Addition of support for more languages.
  • Implementation of a graphical interface to facilitate interaction.
  • Improvement of speech recognition and speech synthesis robustness.
  • More configuration possibilities, such as language level and response complexity.

Contributing

Contributions are welcome! For suggestions, bug fixes, and other changes, feel free to open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the license file for more details.

About

A chatbot powered by the Google Text-to-Speech API, seamlessly converting Gemini's responses into natural speech.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages