Seamless and real-time voice interaction with AI.
Hint: Anybody interested in state-of-the-art voice solutions please also have a look at Linguflex. It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available.
Uses faster_whisper and elevenlabs input streaming for low latency responses to spoken input.
Note: The demo is conducted on a 10Mbit/s connection, so actual performance might be more impressive on faster connections.
voice_talk_vad.py
- automatically detects speech
voice_talk.py
- toggle recording on/off with the spacebar
Replace your_openai_key
and your_elevenlabs_key
with your OpenAI and ElevenLabs API key values in the code.
Install the required Python libraries:
pip install openai elevenlabs pyaudio wave keyboard faster_whisper numpy torch
Execute the main script based on your mode preference:
python voice_talk_vad.py
or
python voice_talk.py
Talk into your microphone.
Listen to the reply.
- Press the space bar to initiate talk.
- Speak your heart out.
- Hit the space bar again once you're done.
- Listen to reply.
Feel free to fork, improve, and submit pull requests. If you're considering significant changes or additions, please start by opening an issue.
Huge shoutout to:
- The hardworking developers behind faster_whisper.
- ElevenLabs for their cutting-edge voice API.
- OpenAI for pioneering with the GPT-4 model.