NZTA-GraphRAG

Graph Retrieval Augmented Generative Chatbot Prototype

A prototype chatbot developed for the New Zealand Transport Agency (NZTA), leveraging graph-based retrieval methods to provide intelligent and contextually relevant responses.

Introduction

NZTA-GraphRAG is a prototype of a Graph Retrieval Augmented Generative Chatbot developed for the New Zealand Transport Agency (NZTA). It utilizes graph-based retrieval methods to deliver intelligent and contextually relevant responses.

Setup

Python Version

This project requires Python 3.11. Ensure that you have Python 3.11 installed on your system before proceeding with the setup.

You can verify your Python version by running:

python --version

1. Clone the Repository

Clone the repository using Git:

git clone https://github.com/fnavarro94/NZTA-GraphRAG.git

Alternatively, download the repository as a ZIP file and extract it locally.

Navigate to the NZTA-GraphRAG directory:

cd NZTA-GraphRAG

Copy the example environment file:

cp .env.example .env

You will use this .env file later.

2. Obtain a Groq API Key

To run this project, you will need a Groq API key, which provides free access to fast LLM inference models through an API. You can obtain one by visiting the Groq Console.

Alternatively, you can use Ollama locally on your machine, but response generation will be much slower.

Once you have generated your Groq API key, copy it and paste it into the .env file at the GROQ_API_KEY entry.

For example:

# .env
GROQ_API_KEY=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
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=
NEO4J_URL=bolt://localhost:7687
NEO4J_DATABASE=neo4j

3. Set Up Neo4j

We use Neo4j for this use case, but Llama-Index supports many other graph databases. Refer to Llama-Index Graph Stores for more information.

Download Neo4j Desktop.
Follow this guide to start your first database instance.
Copy your created username (NEO4J_USERNAME), password (NEO4J_PASSWORD), and URL (NEO4J_URL) into the .env file.

4. Install Google Chrome

Download and install the latest version of Google Chrome from the official support page.

Download ChromeDriver ensuring that the version matches the version of Google Chrome you have installed:

Visit the Chrome for Testing page.

5. Set Up ChromeDriver

Copy the Executable:
- Windows: Copy chromedriver.exe.
- Mac: Copy chromedriver.
Place the executable and all other files from the download into the driver directory of the repository.
For Mac Users Only: Set the necessary permissions by running the following command in the terminal:
```
sudo xattr -d com.apple.quarantine "/path/to/driver/executable/chromedriver"
```
Replace "/path/to/driver/executable/chromedriver" with the actual path to the ChromeDriver executable.

Usage

After completing the setup steps, you can run the crawlers to download the data and then run the knowledge graph data loader script to start populating the property graph index into the Neo4j database. Once both of these steps are done, you can interact with the chatbot.

Note: If you prefer to skip the data crawling and loading steps, you can load the Neo4j database directly from the provided dump file. See Alternative: Load Database from Dump File for instructions.

Create a Python 3.11 Virtual Environment

To ensure compatibility and maintain a clean project environment, create a virtual environment using Python 3.11:

Create the virtual environment (replace nzta_env with your preferred environment name):
```
python3.11 -m venv nzta_env
```
Activate the virtual environment:
- On macOS/Linux:
```
source nzta_env/bin/activate
```
- On Windows:
```
nzta_env\Scripts\activate
```
After activation, your terminal prompt will indicate the active environment (e.g., (nzta_env)).

Install Dependencies

First, install the required dependencies:

pip install -r requirements.txt

Patch Files

To apply custom changes to the original LlamaIndex source code for data ingestion and retrieval, run the following command:

python patch.py

This will overwrite the relevant files with the modifications described in the dissertation (Graph Based RAG.pdf). If you prefer to use the original LlamaIndex code, simply skip this step.

To revert to the original LlamaIndex code after applying the patch, run:

python revert_patch.py

Crawl Data

Note: If you prefer to skip the data crawling and loading steps, you can load the Neo4j database directly from the provided dump file. See Alternative: Load Database from Dump File for instructions.

To crawl NZTA OIA responses:

python crawler.py

For Ministry of Transport data:

python transport_gov_crawler.py

Load Data into Neo4j

Note: If you loaded the database from the dump file, you can skip this step.

Once the crawlers have completed, start uploading data into Neo4j. Ensure you have started the database instance in the Neo4j Desktop application.

python kg_data_loader.py

Alternative: Load Database from Dump File

If you prefer to skip the data crawling and loading steps, you can load the Neo4j database directly from the provided dump file located at oia_knowledge_graph/neo4j.dump.

Follow the instructions in this Google Doc to restore the database from the dump file.

Note: Replace YOUR_LINK_HERE with the actual URL to the Google Doc containing the instructions.

Run the Chatbot

Navigate to the chatbot directory:

cd chatbot

Run the Flask app that hosts the chatbot:

python kg_app.py

You will find the URL of the locally hosted app in the output. Open that URL in your browser to access the chatbot interface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NZTA-GraphRAG

Table of Contents

Introduction

Setup

Python Version

1. Clone the Repository

2. Obtain a Groq API Key

3. Set Up Neo4j

4. Install Google Chrome

5. Set Up ChromeDriver

Usage

Create a Python 3.11 Virtual Environment

Install Dependencies

Patch Files

Crawl Data

Load Data into Neo4j

Alternative: Load Database from Dump File

Run the Chatbot

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
chatbot		chatbot
oia_knowledge_graph		oia_knowledge_graph
patch_files		patch_files
reference_tables		reference_tables
.env.example		.env.example
Graph Based RAG.pdf		Graph Based RAG.pdf
README.md		README.md
crawler.ipynb		crawler.ipynb
crawler.py		crawler.py
kg_data_loader.ipynb		kg_data_loader.ipynb
kg_data_loader.py		kg_data_loader.py
patch.py		patch.py
requirements.txt		requirements.txt
revert_patch.py		revert_patch.py
transport_gov_crawler.ipynb		transport_gov_crawler.ipynb
transport_gov_crawler.py		transport_gov_crawler.py

fnavarro94/NZTA-GraphRAG

Folders and files

Latest commit

History

Repository files navigation

NZTA-GraphRAG

Table of Contents

Introduction

Setup

Python Version

1. Clone the Repository

2. Obtain a Groq API Key

3. Set Up Neo4j

4. Install Google Chrome

5. Set Up ChromeDriver

Usage

Create a Python 3.11 Virtual Environment

Install Dependencies

Patch Files

Crawl Data

Load Data into Neo4j

Alternative: Load Database from Dump File

Run the Chatbot

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages