Improve AI Voice Agent Response Time by Utilizing WebSocket for Streaming Audio #48

shakir-snakescript · 2024-07-19T07:14:43Z

I have observed that the current implementation of the AI voice agent, which uses OpenAI, Deepgram, and Twilio, experiences a delay of 4-5 seconds before responding when a call begins. This is despite using the stream = true feature. It appears that the response is delayed until the stream is completed.

In the current implementation, there is a loop that buffers the audio:
while(Object.prototype.hasOwnProperty.call(this.audioBuffer, this.expectedAudioIndex)) { const bufferedAudio = this.audioBuffer[this.expectedAudioIndex]; this.sendAudio(bufferedAudio); this.expectedAudioIndex++; } } else { this.audioBuffer[index] = audio; }

I believe this delay can be reduced by utilizing WebSocket within the while loop to stream the audio in chunks, rather than waiting for the entire stream to connect.

By implementing WebSocket for chunk-by-chunk streaming, the AI voice agent can respond more promptly, significantly enhancing the user experience.

Please let me know if I make sense or there is any reason for you to handle the stream in this way?

The text was updated successfully, but these errors were encountered:

akashkaushik33 · 2024-08-22T12:06:41Z

Don't you think it will reduce the time by very minor margins? The only difference will be, we will send data in chunks rather than in one go. Depending on the internet speed there will be different results for reduction in delay. But this will surely help.

What I was thinking is, we should optimize the time delay in between when a user stops speaking and sending the audio.
If there is a background noise of a certain level in that case also, the time increases significantly while sending the audio since the listener will register it as a foreground event and will wait until that noise subsides.

badereddineqodia · 2024-10-08T14:09:30Z

I think using OpenAI's real-time API now is perfect, as it eliminates the need for additional middle services that would add more latency.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve AI Voice Agent Response Time by Utilizing WebSocket for Streaming Audio #48

Improve AI Voice Agent Response Time by Utilizing WebSocket for Streaming Audio #48

shakir-snakescript commented Jul 19, 2024 •

edited

Loading

akashkaushik33 commented Aug 22, 2024

badereddineqodia commented Oct 8, 2024

Improve AI Voice Agent Response Time by Utilizing WebSocket for Streaming Audio #48

Improve AI Voice Agent Response Time by Utilizing WebSocket for Streaming Audio #48

Comments

shakir-snakescript commented Jul 19, 2024 • edited Loading

akashkaushik33 commented Aug 22, 2024

badereddineqodia commented Oct 8, 2024

shakir-snakescript commented Jul 19, 2024 •

edited

Loading