A powerful document search and question-answering tool that supports multiple retrieval methods for finding relevant information in markdown documents.
- BM25: Traditional keyword-based retrieval with AI-generated keywords and weights
- Embedding: Dense vector embeddings using SentenceTransformers + FAISS
- MLX: Apple Silicon optimized embeddings with MLX
- Gemini: Google Gemini embeddings via API
- EmbeddingGemma: Google EmbeddingGemma-300m model via SentenceTransformers
- OpenRouter: OpenRouter embeddings via API (supports various embedding models)
- ColBERT: Liquid AI late-interaction retrieval with ColBERT + PLAID index
- Multimodal: ChromaDB + Ollama embeddings
- Qdrant: Qdrant + Ollama embeddings
- Smart Keyword Generation: Uses local LLMs to generate optimized keywords for BM25 search
- Question Answering: Structured answers with citations and confidence levels
- Context-Aware Search: Optimized context windows for efficient processing
- Install the main minions package from the root directory:
cd ../../
pip install -e .- Install additional dependencies for this app:
cd apps/minions-doc-search
pip install -r requirements.txtSearch for documents using different retrieval methods:
# Use BM25 with AI-generated keywords (default)
python local_rag_document_search.py --retriever bm25
# Use dense embeddings
python local_rag_document_search.py --retriever embedding
# Use MLX embeddings (Apple Silicon)
python local_rag_document_search.py --retriever mlx
# Use Gemini embeddings
python local_rag_document_search.py --retriever gemini
# Use OpenRouter embeddings (requires OPENROUTER_API_KEY)
python local_rag_document_search.py --retriever openrouter
# Use multimodal retrieval
python local_rag_document_search.py --retriever multimodal# Custom query and settings
python local_rag_document_search.py \
--retriever bm25 \
--query "How many FTEs were approved?" \
--top-k 5 \
--model_name "gemma3:4b" \
--documents-path "/path/to/documents"--retriever: Retrieval method (bm25,embedding,mlx,gemini,embeddinggemma,openrouter,colbert,multimodal,qdrant)--query: Search query (default: FTE approval question)--top-k: Number of results to return (default: 3)--model_name: Ollama model for keyword generation and QA (default:gemma3:4b)
For OpenRouter embeddings:
OPENROUTER_API_KEY: Your OpenRouter API key (required)OPENROUTER_EMBEDDING_MODEL: Embedding model to use (default:openai/text-embedding-3-small)
Available OpenRouter embedding models include:
openai/text-embedding-3-smallopenai/text-embedding-3-largeopenai/text-embedding-ada-002- And more at https://openrouter.ai/models?output_modalities=embeddings
--documents-path: Path to directory containing .md files
The script loads all .md files from the specified directory and prepares them for search.
- Uses a local LLM to generate optimized keywords from natural language queries
- Assigns importance weights to each keyword (1.0-5.0 scale)
- Performs BM25 search with weighted keywords
- Embedding: Creates dense vector representations using SentenceTransformers
- MLX: Optimized for Apple Silicon using MLX framework
- Multimodal: Uses ChromaDB with Ollama embeddings
- Takes top-k retrieved documents as context
- Uses structured output with Pydantic models
- Returns answer, citation, and confidence level
The tool provides:
- Ranked search results with scores and previews
- Structured answers with:
- Direct answer to the question
- Exact citation from source documents
- Confidence level (high/medium/low)
=== Document Search with BM25 Retriever ===
User Query: 'How many additional FTEs were approved in the executive team call?'
Top 3 results for user query:
1. executive_leadership_team_monthly_update.md
BM25 Score: 3.000
Preview: In the executive leadership team meeting, we discussed...
OLLAMA ANSWER:
Question: How many additional FTEs were approved in the executive team call?
Answer: 5 additional FTEs were approved for the engineering team.
Citation: "The executive team approved 5 additional FTE positions for Q4 hiring"
Confidence: high
The app requires:
- Core: pydantic, argparse, pathlib
- Optional: sentence-transformers, faiss-cpu, chromadb, mlx
- Main minions package (install from root directory)
- Ollama: Must be installed and running for local LLM functionality
- Model: Default model (
gemma3:4b) should be available in Ollama - Documents: Directory containing
.mdfiles to search
- Model not found: Ensure the specified model is installed in Ollama
- Import errors: Install optional dependencies based on chosen retriever
- MLX issues: MLX only works on Apple Silicon Macs
- No documents found: Check that the documents path contains
.mdfiles
- Use BM25 for keyword-focused searches
- Use embedding methods for semantic similarity
- Adjust
--top-kbased on document collection size - MLX provides best performance on Apple Silicon
This app is part of the larger Minions project. See the main project README for contribution guidelines.