Skip to content

Improve receive responses from the bot to almost instantly #30

@maxzaikin

Description

@maxzaikin

Description

The current RAG pipeline has a high response latency (3-5 minutes) due to the slow inference speed of the local LLM. This leads to client timeouts and a poor user experience. This story aims to solve the UX problem by implementing streaming, allowing the user to see the response as it's being generated.

Acceptance Criteria

  • The API has a new /chat/stream endpoint that returns a StreamingResponse.
  • The RAGEngine can generate and yield tokens in real-time.
  • The tg-gateway client can consume the stream and progressively edit the message in Telegram.
  • The user sees the first token of the response within 2-3 seconds of sending a message.

Tasks

  • [Task] ID: RAG-10 - feat(api): Refactor the RAG API endpoint to support streaming responses.
  • [Task] ID: RAG-11 - feat(agent): Modify RAGEngine to return an async generator.
  • [Task] ID: RAG-12 - refactor(tg-gateway): Update RagApiClient to handle streaming HTTP responses.
  • [Task] ID: RAG-13 - feat(tg-gateway): Implement a message handler that edits a Telegram message progressively.

Metadata

Metadata

Assignees

Labels

featurework on feature (add new or improve existing)

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions