Skip to content

feat: Migrate to Qdrant and implement Hybrid Search (BM25 + Vector) #28

@maxzaikin

Description

@maxzaikin

Description:

Evolving RAG Core Beyond the MVP Stage. Current RAG pipeline, built on ChromaDB, has served well as an MVP. However, as I plan for future growth, performance optimization, and more advanced search capabilities, I'm hitting the architectural limits of a simplicity-focused vector database.

To transition our system from a prototype to a production-grade service, I decided to switch to more powerful and core for retrieval system. This involves not only changing the database technology but also evolving vector search methodology.

High-Level Goals & Benefits

This feature will replace existing ChromaDB with Qdrant as a primary vector store and enhance retrieval pipeline by implementing Hybrid Search.

  • Performance & Scalability (Qdrant): Migrate to a high-performance, Rust-based vector database designed for massive-scale, low-latency production workloads.
  • Memory & Cost Efficiency (Qdrant): Leverage Qdrant's built-in quantization to significantly reduce the memory footprint of vectors, allowing to scale expert KB cost-effectively.
  • Advanced Filtering (Qdrant): Utilize Qdrant's fast, metadata-indexed filtering, which is critical for future features like multi-tenant RAG and complex self-querying.
  • Enhanced Search Accuracy (Hybrid Search): Implement a Hybrid Search mechanism combining:

Vector Search: For finding semantically similar context.

  • BM25 (Keyword Search): For pinpointing documents with exact keyword matches (e.g., acronyms, codes, specific names), overcoming a common weakness of pure vector search.
  • Risk Mitigation (Rollback Strategy): While Qdrant will become the default, the existing ChromaDB implementation will be retained as an inactive, switchable option, providing a rapid rollback path if needed.

Plan of Action

Phase 1: Infrastructure & Dependency Update

  • chore(infra): Update docker-compose.infra.yml

    • [-] Add a new qdrant service using the official qdrant/qdrant image.
    • [-] Introduce Docker Compose profiles: qdrant (as default) and chroma (as a fallback). The chroma service will only be active when its profile is specified.
  • chore(deps): Update Dependencies in services/a-rag

    • [-] Add qdrant-client and fast-bm25 (or a similar library for BM25) to pyproject.toml.
    • [-] The chromadb dependency will be moved to an optional dependency group (e.g., [project.optional-dependencies]) to be installed only when needed.
    • [-] Run uv pip sync to update the environment.

Phase 2: Code Refactoring & Implementation

  • refactor(storage): Implement the Repository Pattern

    • [-] Define a base abstract class VectorStoreRepository in src/storage/ with methods like add, search, delete, etc.
    • [-] Create a QdrantRepository class that implements the base repository using qdrant-client.
    • [-] Move the existing ChromaDB logic into a ChromaRepository class that also implements the base repository.
  • refactor(core): Create a Repository Factory

    • [-] Implement a factory function (get_vector_store_repository) that reads a VECTOR_DATABASE_TYPE variable from the environment (.env) and returns an instance of either QdrantRepository or ChromaRepository.
  • feat(agent): Implement Hybrid Search Logic

    • [-] The search method within QdrantRepository will be updated to perform two queries in parallel: a vector similarity search and a keyword-based search using BM25.
    • [-] The results from both searches will be combined and re-ranked using a fusion algorithm (e.g., Reciprocal Rank Fusion) to produce a single, more relevant list of results.
  • refactor(pipelines): Adapt the ZenML Ingestion Pipeline

    • [-] The index_documents step in the rag_ingestion_pipeline will be modified to use the new VectorStoreRepository (which will point to Qdrant by default).
    • [-] The pipeline will now be responsible for creating both the vector index and the BM25 index.
  • refactor(api): Adapt the Online RAG Service

    • [-] All parts of the a-rag API that perform retrieval will be updated to use the VectorStoreRepository via the factory. This ensures the change is transparent to the rest of the application.

Architectural Decisions & Rationale

As analyzed, Qdrant offers a superior trade-off between performance (RPS, latency), memory efficiency (quantization), and production-readiness (advanced filtering, clustering) compared to ChromaDB for our long-term goals.

Pure vector search excels at semantic understanding but can fail on queries requiring exact term matching. BM25 covers this gap perfectly. Combining them gives us the best of both worlds: semantic relevance and keyword precision.

This architectural pattern decouples application logic from the specific database implementation. It makes this complex migration manageable, localizes all DB-specific code, and simplifies future maintenance or even another migration.

Qdrant as a Logical Feature Store: I will continue to leverage vector DB as a "logical feature store". As Qdrant's metadata index functions like a NoSQL DB, we will store both cleaned, full-text documents (for future training pipelines) and chunked, embedded documents (for RAG) within it, making it our single source of truth for processed data.

Acceptance Criteria

  • [-] The application runs successfully by default using the qdrant profile in Docker Compose.
  • [-] The rag_ingestion_pipeline correctly populates the Qdrant instance with both vector and BM25 index data.
  • [-] The RAG API endpoint successfully retrieves relevant documents using the new hybrid search mechanism in Qdrant.
  • [-] The application can be switched to use ChromaDB by changing the .env variable and restarting Docker Compose with the chroma profile, demonstrating the rollback capability.
  • [-] Performance benchmarks (to be defined) show an improvement in retrieval latency and/or memory usage compared to the previous ChromaDB-only implementation.

Context & Inspiration

As I continue to evolve TGB project, I'm trying to actively apply best practices from industry-leading resources. This migration is directly inspired by the architectures and recommendations outlined in the "LLM Engineer's Handbook". The book emphasizes the importance of selecting production-grade tools for performance and scalability, citing Qdrant as a prime example for demanding RAG applications. By implementing this change, I aligning TGB project with proven, state-of-the-art practices.

Metadata

Metadata

Assignees

Labels

architectureenhancementNew feature or requestfeaturework on feature (add new or improve existing)

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions