Skip to content

perf: Parallelize LLM calls in extract_keywords_and_rewrite to reduce latency #78

@QuantumByte-01

Description

@QuantumByte-01

Problem

extract_keywords_and_rewrite makes 4 sequential Gemini calls per request:

  1. detect_intents (raw query)
  2. rewrite_with_history
  3. call_gemini_for_keywords
  4. detect_intents (rewritten query)

Calls 1 and 2 are independent — both only need the raw query. Running them with asyncio.gather saves ~1-2s per request.

Fix

intents0, effective = await asyncio.gather(
    call_gemini_detect_intents(state["query"], history),
    call_gemini_rewrite_with_history(state["query"], history),
)

Then run call_gemini_for_keywords and second detect_intents in a second gather.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions