-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
featurework on feature (add new or improve existing)work on feature (add new or improve existing)
Description
Description
The current RAG pipeline has a high response latency (3-5 minutes) due to the slow inference speed of the local LLM. This leads to client timeouts and a poor user experience. This story aims to solve the UX problem by implementing streaming, allowing the user to see the response as it's being generated.
Acceptance Criteria
- The API has a new
/chat/streamendpoint that returns aStreamingResponse. - The
RAGEnginecan generate and yield tokens in real-time. - The
tg-gatewayclient can consume the stream and progressively edit the message in Telegram. - The user sees the first token of the response within 2-3 seconds of sending a message.
Tasks
-
[Task]ID: RAG-10 -feat(api): Refactor the RAG API endpoint to support streaming responses. -
[Task]ID: RAG-11 -feat(agent): Modify RAGEngine to return an async generator. -
[Task]ID: RAG-12 -refactor(tg-gateway): Update RagApiClient to handle streaming HTTP responses. -
[Task]ID: RAG-13 -feat(tg-gateway): Implement a message handler that edits a Telegram message progressively.
Metadata
Metadata
Assignees
Labels
featurework on feature (add new or improve existing)work on feature (add new or improve existing)
Projects
Status
Backlog