🗣️ feat: STT & TTS #1603

berry-13 · 2024-01-20T22:44:17Z

Summary

For STT, press the button or use Shift + Alt + L

For TTS, press the button (if you hold the click, you can download the audio file)

checklist

STT

Browser
OpenAI Whisper
Local Whisper (tested on LocalAI and HomeAssistant Whisper)
Azure Whisper (not tested yet but it should work)
All the OpenAI compatible STT

TTS

TODO:

~~fix hark 🤔~~
improve STTBrowser error handling
handle audio files in the file upload and automatically transcribe them

UI

Speech TAB Explanation

NOTE: This is an explanation of how the automatic conversation works. To use it, you need to enable all of the settings in the Speech tab. This feature is still in beta, and sometimes it may not work as expected. Right now, after the AI input, I'm still not triggering the TTS call

graph TD;

    UserRequest((User Requests STT)) --> CheckLocalStorage{Check Local Storage for Engine};
    CheckLocalStorage -->|Engine Browser| AutomaticBrowser((Automatic Browser STT));
    CheckLocalStorage -->|Engine External| ExternalCheck{Check Transcription Status};
    ExternalCheck -->|Transcription Active| StopTranscription;
    ExternalCheck -->|Transcription Inactive| ListenAudio((Listen to User Audio));
    ListenAudio --> CheckAudio{Check Audio Level};
    CheckAudio -->|Below Threshold| SaveAudio;
    CheckAudio -->|Above Threshold| ContinueRecording;
    SaveAudio --> DataProviderRequest((Data Provider Request));
    DataProviderRequest --> APICall("/api/files/stt");
    APICall -->|Success| SetText((Set Text in Text Area));
    SetText -->|Auto Send Text Enabled| AutoSendRequest((Auto Send Text Request));
    AutoSendRequest --> APICall2("/chat/completions");
    APICall2 -->|Success| TriggerTTS((Trigger TTS));
    TriggerTTS --> TTSRequest((TTS Request));
    TTSRequest --> APICall3("/api/files/tts");
    APICall3 -->|Success| PlayAudio((Play Audio));
    PlayAudio -->|Playback Finished| WaitTwoSeconds;
    WaitTwoSeconds --> RepeatSTT((Repeat STT Trigger));

    subgraph Loop
    RepeatSTT --> ListenAudio;
    end

    StopTranscription((Stop Transcription));

thank you @bsu3338 for the integrated browser STT & TTS
thank you @szkiu for the Azure STT #2025

Change Type

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Testing

Checklist

My code adheres to this project's style guidelines
I have performed a self-review of my own code
I have commented in any complex areas of my code
I have made pertinent documentation changes
My changes do not introduce new warnings
I have written tests demonstrating that my changes are effective or that my feature works
Local unit tests pass with my changes
Any changes dependent on mine have been merged and published in downstream modules.

berry-13 · 2024-04-27T21:34:35Z

@berry-13 when do you think you will be completely finished with this pr so that it is ready to merge?

when I commit, it means the changes are ready for merging. But since @danny-avila mentioned he's going to refactor and fix some things, I'll continue until he begins reviewing it. Besides, I'll be working with him to ensure the Conversation Mode works properly since it's only partially functional at the moment

kneelesh48 · 2024-05-07T17:44:20Z

@berry-13 have you added support for Azure and GCP TTS in this PR?
Those are the OG TTS models. Also, eleven labs is expensive and I don't like their subscription pricing model.

berry-13 · 2024-05-08T14:53:29Z

@berry-13 have you added support for Azure and GCP TTS in this PR? Those are the OG TTS models. Also, eleven labs is expensive and I don't like their subscription pricing model.

I personally use Elevenlabs. It has websocket support and one of the best TTS models out there. I can't add Azure TTS because I don't have a key (I can't). Google TTS is planned, and I'm working on adding support for multiple providers. I'll also be adding some other providers in the future

kneelesh48 · 2024-05-11T16:12:03Z

@berry-13 I can provide you an azure key

bnord01 · 2024-05-16T08:52:19Z

FYI: The current implementation crashes the whole application on login in Firefox.

Unexpected Application Error!
SpeechRecognition is not a constructor

initializeSpeechRecognition@http://localhost:3090/src/hooks/Input/useSpeechToTextBrowser.ts:2127:25
useSpeechToTextBrowser/<@http://localhost:3090/src/hooks/Input/useSpeechToTextBrowser.ts:2150:25
``

berry-13 · 2024-05-16T09:25:02Z

FYI: The current implementation crashes the whole application on login in Firefox.

Unexpected Application Error!
SpeechRecognition is not a constructor

initializeSpeechRecognition@http://localhost:3090/src/hooks/Input/useSpeechToTextBrowser.ts:2127:25
useSpeechToTextBrowser/<@http://localhost:3090/src/hooks/Input/useSpeechToTextBrowser.ts:2150:25
``

oh, thank you for reporting this!
I'm going to fix this now

…/runtime in main.jsx

bsu3338 added 30 commits August 4, 2023 15:57

Update TextChat.jsx

1af6751

Update SubmitButton.jsx

b3636ab

Update TextChat.jsx

4401d0d

Merge branch 'danny-avila:main' into Speech-to-Text

07b2af1

Update SubmitButton.jsx

5a67874

Create ListeningIcon.tsx

14f4d66

Update index.ts

65a7b2b

Update SubmitButton.jsx

31441ed

Update TextChat.jsx

74fa8d1

Update ListeningIcon.tsx

37c0f5b

Update ListeningIcon.tsx

46c53d1

Create SpeechRecognition.tsx

2ffb5be

Update TextChat.jsx

49a9dae

Update TextChat.jsx

eb842c6

Update SpeechRecognition.tsx

8982ec1

Update TextChat.jsx

ca3f064

Update SpeechRecognition.tsx

2522d76

Update SpeechRecognition.tsx

d9a4d2f

Update SpeechRecognition.tsx

42aadd2

Merge branch 'danny-avila:main' into Speech-to-Text

5ad9927

Update SpeechRecognition.tsx

5d76082

Merge branch 'danny-avila:main' into Speech-to-Text

93ceae6

Update SubmitButton.jsx

b49024f

Update TextChat.jsx

28a00a5

Update SpeechRecognition.tsx

69ff48d

Merge branch 'main' into Speech-to-Text

cfe6325

Merge branch 'main' into Speech-to-Text

fd23679

Merge branch 'main' into Speech-to-Text

148a71b

Merge branch 'main' into Speech-to-Text

252325d

Create SpeechSynthesis.tsx

f9ed2ad

berry-13 added 6 commits April 28, 2024 01:50

feat: custom elevenlabs compatibility

3e40ad0

fix(useTextToSpeechExternal): cache switch not working

7f48031

Merge branch 'main' into Speech-to-Text

0875fe5

feat: animation for STT

e39d0eb

Merge branch 'main' into Speech-to-Text

db4fc17

Merge branch 'main' into Speech-to-Text

9f07c80

Merge branch 'main' into Speech-to-Text

415a869

berry-13 added 4 commits May 10, 2024 15:34

Merge branch 'main' into Speech-to-Text

e06a13b

fix: settings var not working

ca12731

chore: remove unused var

f3b78cf

feat: voice dropdown; refactor: yaml changes

486740a

berry-13 added 2 commits May 11, 2024 23:47

fix(textToSpeech): remove undefined properties

d3f5878

refactor: Remove console logs and unused variable

8647cc3

berry-13 mentioned this pull request May 13, 2024

📝 docs: STT/TTS LibreChat-AI/librechat.ai#19

Draft

berry-13 added 3 commits May 13, 2024 20:33

Merge branch 'main' into Speech-to-Text

cc35f77

fix: TTS; feat: support coqui and piper

b619b80

fix: some STT issues

6c1f7df

kneelesh48 mentioned this pull request May 14, 2024

🚀 feat: gpt-4o #2692

Merged

berry-13 added 2 commits May 14, 2024 23:09

fix: stt test

ece8f89

fix: STT backend sending wrong data

24ad1d9

berry-13 added 4 commits May 16, 2024 17:51

BREAKING: switch to react-speech-recognition, add regenerator-runtime…

74a8ef5

…/runtime in main.jsx

feat: websocket backend

80b6689

Merge branch 'main' into Speech-to-Text

e27f59e

foundations for websocket

edc5c8e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🗣️ feat: STT & TTS #1603

🗣️ feat: STT & TTS #1603

berry-13 commented Jan 20, 2024 •

edited

berry-13 commented Apr 27, 2024 •

edited

kneelesh48 commented May 7, 2024

berry-13 commented May 8, 2024

kneelesh48 commented May 11, 2024

bnord01 commented May 16, 2024

berry-13 commented May 16, 2024 •

edited

🗣️ feat: STT & TTS #1603

Are you sure you want to change the base?

🗣️ feat: STT & TTS #1603

Conversation

berry-13 commented Jan 20, 2024 • edited

Summary

checklist

TODO:

UI

Speech TAB Explanation

Change Type

Testing

Checklist

berry-13 commented Apr 27, 2024 • edited

kneelesh48 commented May 7, 2024

berry-13 commented May 8, 2024

kneelesh48 commented May 11, 2024

bnord01 commented May 16, 2024

berry-13 commented May 16, 2024 • edited

berry-13 commented Jan 20, 2024 •

edited

berry-13 commented Apr 27, 2024 •

edited

berry-13 commented May 16, 2024 •

edited