Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve AI Voice Agent Response Time by Utilizing WebSocket for Streaming Audio #48

Open
shakir-snakescript opened this issue Jul 19, 2024 · 2 comments

Comments

@shakir-snakescript
Copy link

shakir-snakescript commented Jul 19, 2024

I have observed that the current implementation of the AI voice agent, which uses OpenAI, Deepgram, and Twilio, experiences a delay of 4-5 seconds before responding when a call begins. This is despite using the stream = true feature. It appears that the response is delayed until the stream is completed.

In the current implementation, there is a loop that buffers the audio:
while(Object.prototype.hasOwnProperty.call(this.audioBuffer, this.expectedAudioIndex)) { const bufferedAudio = this.audioBuffer[this.expectedAudioIndex]; this.sendAudio(bufferedAudio); this.expectedAudioIndex++; } } else { this.audioBuffer[index] = audio; }

I believe this delay can be reduced by utilizing WebSocket within the while loop to stream the audio in chunks, rather than waiting for the entire stream to connect.

By implementing WebSocket for chunk-by-chunk streaming, the AI voice agent can respond more promptly, significantly enhancing the user experience.

Please let me know if I make sense or there is any reason for you to handle the stream in this way?

@akashkaushik33
Copy link

Don't you think it will reduce the time by very minor margins? The only difference will be, we will send data in chunks rather than in one go. Depending on the internet speed there will be different results for reduction in delay. But this will surely help.

What I was thinking is, we should optimize the time delay in between when a user stops speaking and sending the audio.
If there is a background noise of a certain level in that case also, the time increases significantly while sending the audio since the listener will register it as a foreground event and will wait until that noise subsides.

@badereddineqodia
Copy link

I think using OpenAI's real-time API now is perfect, as it eliminates the need for additional middle services that would add more latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@akashkaushik33 @badereddineqodia @shakir-snakescript and others