A TTS [text-to-speech] extension for oobabooga text WebUI
- 100% offline
- No AI
- Low CPU
- Low network bandwidth usage
- No word limit
silero_tts
is great, but it seems to have a word limit, so I made SpeakLocal.
- This extension uses pyttsx4 for speech generation and ffmpeg for audio conversio.
- Pyttsx4 uses the native TTS abilities of the host machine (Linux, MacOS, Windows) so you shouldn't need to install anything else for this to work.
- This extension re-encodes the locally generated .WAV file to an .MP3 and pre-pends a media player to the text output field.
- The .MP3 encoding is set to ~18kbps compression so the output file is roughly 1 kilobyte for each second of audio. It's set low to conserve bandwidth when using mobile data.
Fire up a command prompt | shell:
cd PATH_TO_text-generation-webui/extensions
Now clone this repo:
git clone https://github.com/ill13/SpeakLocal
You may have to do:
pip install -r requirements.txt
...If pytssx4 and ffmpeg-python are not installed.
Finally enable the extension in the session tab
If you get this error:
AttributeError: module 'ffmpeg' has no attribute 'input'
Open the command line virtual environment and enter the following:
pip uninstall ffmpeg
pip uninstall ffmpeg-python
pip install ffmpeg-python
On Windows 10, make sure ffmpeg.exe in in your path
Restart Ooba and you should be all set.
More audio options added.
- Voice selection: An enumerated list of TTS voices that are installed on the host.
- Speech rate: Speed up or slow down how fast the words are spoken
- Bitrate: Ability to adjust sound quality. Beware, higher bitrate means more data used!