You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have observed that the current implementation of the AI voice agent, which uses OpenAI, Deepgram, and Twilio, experiences a delay of 4-5 seconds before responding when a call begins. This is despite using the stream = true feature. It appears that the response is delayed until the stream is completed.
In the current implementation, there is a loop that buffers the audio: while(Object.prototype.hasOwnProperty.call(this.audioBuffer, this.expectedAudioIndex)) { const bufferedAudio = this.audioBuffer[this.expectedAudioIndex]; this.sendAudio(bufferedAudio); this.expectedAudioIndex++; } } else { this.audioBuffer[index] = audio; }
I believe this delay can be reduced by utilizing WebSocket within the while loop to stream the audio in chunks, rather than waiting for the entire stream to connect.
By implementing WebSocket for chunk-by-chunk streaming, the AI voice agent can respond more promptly, significantly enhancing the user experience.
Please let me know if I make sense or there is any reason for you to handle the stream in this way?
The text was updated successfully, but these errors were encountered:
Don't you think it will reduce the time by very minor margins? The only difference will be, we will send data in chunks rather than in one go. Depending on the internet speed there will be different results for reduction in delay. But this will surely help.
What I was thinking is, we should optimize the time delay in between when a user stops speaking and sending the audio.
If there is a background noise of a certain level in that case also, the time increases significantly while sending the audio since the listener will register it as a foreground event and will wait until that noise subsides.
I have observed that the current implementation of the AI voice agent, which uses OpenAI, Deepgram, and Twilio, experiences a delay of 4-5 seconds before responding when a call begins. This is despite using the stream = true feature. It appears that the response is delayed until the stream is completed.
In the current implementation, there is a loop that buffers the audio:
while(Object.prototype.hasOwnProperty.call(this.audioBuffer, this.expectedAudioIndex)) { const bufferedAudio = this.audioBuffer[this.expectedAudioIndex]; this.sendAudio(bufferedAudio); this.expectedAudioIndex++; } } else { this.audioBuffer[index] = audio; }
I believe this delay can be reduced by utilizing WebSocket within the while loop to stream the audio in chunks, rather than waiting for the entire stream to connect.
By implementing WebSocket for chunk-by-chunk streaming, the AI voice agent can respond more promptly, significantly enhancing the user experience.
Please let me know if I make sense or there is any reason for you to handle the stream in this way?
The text was updated successfully, but these errors were encountered: