Gringotts

A Python-based multi-agent Retrieval Augmented Generation (RAG) system designed to process and query large volumes of financial documents for M&A due diligence purposes.

Features

Multi-agent architecture for parallel processing of large financial documents
Specialized chunking strategies optimized for financial documents (PDFs, Excel, Word, etc.)
Intelligent financial entity extraction and indexing
Advanced semantic search with financial term expansion
Topic modeling for document categorization
Integration with LLMs for comprehensive analysis and summarization
Vector database integration for efficient retrieval
Distributed processing capabilities for handling large document collections
REST API for interacting with the system

Use Case: M&A Due Diligence

This system is specifically designed to assist financial analysts and investment bankers in the due diligence process for mergers and acquisitions. It can process and analyze:

Financial statements and annual reports
Legal contracts and agreements
Regulatory filings
Market analysis reports
Valuation documents
Tax documents
Due diligence memos and reports

The system extracts key financial information, identifies risks and opportunities, and provides a comprehensive analysis to support M&A decision-making.

Project Structure

financial-due-diligence-rag/
├── config/                 # Configuration files
├── data/                   # Data storage location
│   └── financial_indices/  # Intelligent indices for financial documents
├── docs/                   # Documentation
└── src/                    # Source code
    ├── agents/             # Multi-agent system components
    ├── api/                # API endpoints
    ├── document_processing/ # Document processors for financial documents
    ├── utils/              # Utility functions
    └── vector_store/       # Vector database integration

Document Processing Pipeline

Document Loading: Supports various financial document formats (PDF, DOCX, XLSX, etc.)
OCR Processing: Handles scanned documents with OCR capabilities
Financial Entity Extraction: Identifies companies, monetary values, dates, percentages, etc.
Intelligent Chunking: Splits documents based on semantic boundaries
Metadata Extraction: Extracts key financial metrics and document categories
Embedding Generation: Creates vector representations of document chunks
Intelligent Indexing: Builds specialized indices for financial terms and entities
Topic Modeling: Categorizes documents for better organization and retrieval

Setup

Clone the repository
Create a virtual environment: python -m venv venv
Activate the virtual environment: source venv/bin/activate (Unix) or venv\Scripts\activate (Windows)
Install dependencies: pip install -r requirements.txt
Copy .env.example to .env and add your API keys (especially OpenAI for LLM integration)

Usage

Start the system: python src/main.py
Use the API to upload financial documents and query the system

API Endpoints

POST /api/upload: Upload and process a financial document
POST /api/query: Query the system with financial questions
POST /api/task/status: Check the status of a document processing task
GET /api/collections: List all document collections
GET /api/collections/{collection_name}/stats: Get statistics for a collection

Supported Financial Document Types

The system supports various financial document formats including:

PDF (text and scanned via OCR)
Microsoft Word (DOCX)
Microsoft Excel (XLSX, XLS)
Microsoft PowerPoint (PPTX, PPT)
CSV and TSV files
Plain text files
HTML and XML documents
Markdown files
JSON files

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
docs		docs
frontend		frontend
.git-authors		.git-authors
.gitignore		.gitignore
.node-version		.node-version
.python-version		.python-version
.tool-versions		.tool-versions
.windsurfrules		.windsurfrules
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pairing-session.md		pairing-session.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Gringotts

Features

Use Case: M&A Due Diligence

Project Structure

Document Processing Pipeline

Setup

Usage

API Endpoints

Supported Financial Document Types

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

adamos486/gringotts-fin-rag

Folders and files

Latest commit

History

Repository files navigation

Gringotts

Features

Use Case: M&A Due Diligence

Project Structure

Document Processing Pipeline

Setup

Usage

API Endpoints

Supported Financial Document Types

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages