This repository takes a clear, hands-on approach to Retrieval-Augmented Generation (RAG), breaking down advanced techniques into straightforward, understandable implementations. Instead of relying on frameworks like LangChain
or FAISS
, everything here is built using familiar Python libraries openai
, numpy
, matplotlib
, and a few others.
The goal is simple: provide code that is readable, modifiable, and educational. By focusing on the fundamentals, this project helps demystify RAG and makes it easier to understand how it really works.
- (20-Mar-2025) Added a new notebook on RAG with Reinforcement Learning.
- (07-Mar-2025) Added 20 RAG techniques to the repository.
This repository contains a collection of Jupyter Notebooks, each focusing on a specific RAG technique. Each notebook provides:
- A concise explanation of the technique.
- A step-by-step implementation from scratch.
- Clear code examples with inline comments.
- Evaluations and comparisons to demonstrate the technique's effectiveness.
- Visualization to visualize the results.
Here's a glimpse of the techniques covered:
Notebook | Description |
---|---|
1. Simple RAG | A basic RAG implementation. A great starting point! |
2. Semantic Chunking | Splits text based on semantic similarity for more meaningful chunks. |
3. Chunk Size Selector | Explores the impact of different chunk sizes on retrieval performance. |
4. Context Enriched RAG | Retrieves neighboring chunks to provide more context. |
5. Contextual Chunk Headers | Prepends descriptive headers to each chunk before embedding. |
6. Document Augmentation RAG | Generates questions from text chunks to augment the retrieval process. |
7. Query Transform | Rewrites, expands, or decomposes queries to improve retrieval. Includes Step-back Prompting and Sub-query Decomposition. |
8. Reranker | Re-ranks initially retrieved results using an LLM for better relevance. |
9. RSE | Relevant Segment Extraction: Identifies and reconstructs continuous segments of text, preserving context. |
10. Contextual Compression | Implements contextual compression to filter and compress retrieved chunks, maximizing relevant information. |
11. Feedback Loop RAG | Incorporates user feedback to learn and improve RAG system over time. |
12. Adaptive RAG | Dynamically selects the best retrieval strategy based on query type. |
13. Self RAG | Implements Self-RAG, dynamically decides when and how to retrieve, evaluates relevance, and assesses support and utility. |
14. Proposition Chunking | Breaks down documents into atomic, factual statements for precise retrieval. |
15. Multimodel RAG | Combines text and images for retrieval, generating captions for images using LLaVA. |
16. Fusion RAG | Combines vector search with keyword-based (BM25) retrieval for improved results. |
17. Graph RAG | Organizes knowledge as a graph, enabling traversal of related concepts. |
18. Hierarchy RAG | Builds hierarchical indices (summaries + detailed chunks) for efficient retrieval. |
19. HyDE RAG | Uses Hypothetical Document Embeddings to improve semantic matching. |
20. CRAG | Corrective RAG: Dynamically evaluates retrieval quality and uses web search as a fallback. |
21. Rag with RL | Maximize the reward of the RAG model using Reinforcement Learning. |
fareedkhan-dev-all-rag-techniques/
├── README.md <- You are here!
├── 1_simple_rag.ipynb
├── 2_semantic_chunking.ipynb
├── 3_chunk_size_selector.ipynb
├── 4_context_enriched_rag.ipynb
├── 5_contextual_chunk_headers_rag.ipynb
├── 6_doc_augmentation_rag.ipynb
├── 7_query_transform.ipynb
├── 8_reranker.ipynb
├── 9_rse.ipynb
├── 10_contextual_compression.ipynb
├── 11_feedback_loop_rag.ipynb
├── 12_adaptive_rag.ipynb
├── 13_self_rag.ipynb
├── 14_proposition_chunking.ipynb
├── 15_multimodel_rag.ipynb
├── 16_fusion_rag.ipynb
├── 17_graph_rag.ipynb
├── 18_hierarchy_rag.ipynb
├── 19_HyDE_rag.ipynb
├── 20_crag.ipynb
├── 21_rag_with_rl.ipynb
├── requirements.txt <- Python dependencies
└── data/
└── val.json <- Sample validation data (queries and answers)
└── AI_Information.pdf <- A sample PDF document for testing.
└── attention_is_all_you_need.pdf <- A sample PDF document for testing (for Multi-Modal RAG).
-
Clone the repository:
git clone https://github.com/FareedKhan-dev/all-rag-techniques.git cd all-rag-techniques
-
Install dependencies:
pip install -r requirements.txt
-
Set up your OpenAI API key:
-
Obtain an API key from Nebius AI.
-
Set the API key as an environment variable:
export OPENAI_API_KEY='YOUR_NEBIUS_AI_API_KEY'
or
setx OPENAI_API_KEY "YOUR_NEBIUS_AI_API_KEY" # On Windows
or, within your Python script/notebook:
import os os.environ["OPENAI_API_KEY"] = "YOUR_NEBIUS_AI_API_KEY"
-
-
Run the notebooks:
Open any of the Jupyter Notebooks (
.ipynb
files) using Jupyter Notebook or JupyterLab. Each notebook is self-contained and can be run independently. The notebooks are designed to be executed sequentially within each file.Note: The
data/AI_Information.pdf
file provides a sample document for testing. You can replace it with your own PDF. Thedata/val.json
file contains sample queries and ideal answers for evaluation. The 'attention_is_all_you_need.pdf' is for testing Multi-Modal RAG Notebook.
-
Embeddings: Numerical representations of text that capture semantic meaning. We use Nebius AI's embedding API and, in many notebooks, also the
BAAI/bge-en-icl
embedding model. -
Vector Store: A simple database to store and search embeddings. We create our own
SimpleVectorStore
class using NumPy for efficient similarity calculations. -
Cosine Similarity: A measure of similarity between two vectors. Higher values indicate greater similarity.
-
Chunking: Dividing text into smaller, manageable pieces. We explore various chunking strategies.
-
Retrieval: The process of finding the most relevant text chunks for a given query.
-
Generation: Using a Large Language Model (LLM) to create a response based on the retrieved context and the user's query. We use the
meta-llama/Llama-3.2-3B-Instruct
model via Nebius AI's API. -
Evaluation: Assessing the quality of the RAG system's responses, often by comparing them to a reference answer or using an LLM to score relevance.
Contributions are welcome!