Skip to content

Intelligent document Q&A powered by RAG technology - transform your documents into interactive knowledge bases

License

Notifications You must be signed in to change notification settings

soheil-mp/DocChat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

e13853d Β· Nov 17, 2024

History

17 Commits
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 7, 2024
Nov 17, 2024

Repository files navigation

CC0 License FastAPI React TypeScript Python MongoDB Pinecone OpenAI LangChain Docker Tailwind CSS Node.js Jest WebSocket

Demo Β· Documentation Β· Report Bug Β· Request Feature

DocChat

πŸ€– An intelligent document Q&A chat interface powered by RAG (Retrieval-Augmented Generation) - transform your documents into interactive knowledge bases.

DocChat Demo

πŸ“‹ Table of Contents

✨ Features

  • πŸ“„ Smart Document Management

    • Multi-format support (PDF, DOCX, TXT)
    • Batch uploads with progress tracking
    • Version control & metadata management
  • πŸ’¬ AI-Powered Chat

    • Context-aware responses using RAG
    • Real-time interactions
    • Source citations
    • Conversation history
  • βš™οΈ Customization

    • Multiple LLM providers (OpenAI, Anthropic, Cohere)
    • Adjustable generation parameters
    • Custom prompting
    • Flexible output formatting

πŸš€ Quick Start

Prerequisites

  • Node.js 16+
  • Python 3.8+
  • MongoDB
  • Pinecone account
  • OpenAI API key

Installation

  1. Clone the repository
git clone https://github.com/yourusername/DocChat.git
cd DocChat
  1. Set up the backend
cd backend
python -m venv venv
source venv/bin/activate  # Windows: .\venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env     # Configure your environment variables
  1. Set up the frontend
cd frontend
npm install
cp .env.example .env     # Configure your environment variables

Running Locally

  1. Start the backend server
cd backend
uvicorn app.main:app --reload
  1. Launch the frontend
cd frontend
npm start

Visit http://localhost:3000 to see the application.

πŸ—οΈ Architecture

Loading
graph TD
    A[Client] -->|HTTP/WebSocket| B[FastAPI Backend]
    B -->|Document Storage| C[MongoDB]
    B -->|Vector Storage| D[Pinecone]
    B -->|RAG Pipeline| E[LangChain]
    E -->|LLM Requests| F[OpenAI]

πŸ“ Project Structure

Click to expand
DocChat/
β”œβ”€β”€ backend/              # FastAPI server
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/         # REST endpoints
β”‚   β”‚   β”œβ”€β”€ core/        # Core utilities
β”‚   β”‚   β”œβ”€β”€ services/    # Business logic
β”‚   β”‚   └── models/      # Data models
β”‚   └── tests/           # Backend tests
β”œβ”€β”€ frontend/            # React application
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/  # UI components
β”‚   β”‚   β”œβ”€β”€ features/    # Feature modules
β”‚   β”‚   └── lib/        # Utilities
β”‚   └── tests/          # Frontend tests
└── docs/               # Documentation

πŸ› οΈ Tech Stack

Click to expand

Frontend

  • React 18 with TypeScript
  • TailwindCSS & HeadlessUI
  • React Query & Zustand
  • Jest & Testing Library

Backend

  • FastAPI
  • LangChain & LangGraph
  • MongoDB & Pinecone
  • OpenAI GPT-4

πŸ“¦ Deployment

Docker Deployment

The application can be deployed using Docker in both development and production environments.

Development Environment

# Start all services with hot-reload
docker-compose -f deploy/docker/docker-compose.dev.yml up --build

# Start specific services
docker-compose -f deploy/docker/docker-compose.dev.yml up backend mongodb
docker-compose -f deploy/docker/docker-compose.dev.yml up frontend

# View logs
docker-compose -f deploy/docker/docker-compose.dev.yml logs -f

Production Environment

# Build and start all services in detached mode
docker-compose -f deploy/docker/docker-compose.yml up --build -d

# Check service status
docker-compose -f deploy/docker/docker-compose.yml ps

# Monitor logs
docker-compose -f deploy/docker/docker-compose.yml logs -f

Container Architecture

  • Backend Container: Python FastAPI application with uvicorn server
  • Frontend Container: Nginx serving built React application
  • MongoDB Container: Database service with persistent storage
  • Volumes:
    • mongodb_data: Persistent database storage
    • uploads: Document storage for processed files

Environment Configuration

  1. Backend Environment (.env)
MONGODB_URL=mongodb://mongodb:27017
MONGODB_DB_NAME=DocChat
OPENAI_API_KEY=your_openai_key
PINECONE_API_KEY=your_pinecone_key
PINECONE_ENV=your_pinecone_environment
JWT_SECRET_KEY=your_jwt_secret
  1. Frontend Environment
REACT_APP_API_URL=http://localhost:8000
REACT_APP_WS_URL=ws://localhost:8000/ws

Health Monitoring

The deployment includes health checks for all services:

  • Backend: HTTP health endpoint at /health
  • Frontend: Nginx status page
  • MongoDB: Connection check

Scaling Considerations

  • Backend can be scaled horizontally using Docker Swarm or Kubernetes
  • MongoDB should be configured with replication for production
  • Consider using managed services for databases in production

Cloud Platform Deployment

AWS Deployment

  • EC2 instances for application containers
  • ECS/EKS for container orchestration
  • MongoDB Atlas for database
  • S3 for document storage
  • CloudFront for CDN
  • Route53 for DNS management

Detailed AWS Setup Guide

Google Cloud Platform

  • Google Compute Engine for containers
  • Google Kubernetes Engine for orchestration
  • Cloud Storage for documents
  • Cloud CDN for content delivery
  • Cloud DNS for domain management

Detailed GCP Setup Guide

Microsoft Azure

  • Azure Container Instances
  • AKS for Kubernetes deployment
  • Azure Cosmos DB with MongoDB API
  • Azure Blob Storage for documents
  • Azure CDN for content delivery

Detailed Azure Setup Guide

Security Considerations

  • All containers run as non-root users
  • Environment variables for sensitive data
  • Regular security updates for base images
  • Network isolation between services
  • Rate limiting on API endpoints
  • CORS configuration
  • SSL/TLS encryption

Backup Strategy

  1. Database Backups
# Manual MongoDB backup
docker-compose exec mongodb mongodump --out /backup

# Restore from backup
docker-compose exec mongodb mongorestore /backup
  1. Document Storage Backups
# Backup uploads volume
docker run --rm --volumes-from DocChat_backend_1 -v $(pwd):/backup \
  alpine tar czvf /backup/uploads.tar.gz /app/uploads

πŸ”’ Security

  • JWT-based authentication
  • Rate limiting
  • Input validation
  • CORS protection
  • Regular security audits

🀝 Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit changes (git commit -m 'Add AmazingFeature')
  4. Push to branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“ License

Distributed under the MIT License. See LICENSE for more information.

πŸ”§ Development

Code Style

# Backend
pip install black isort flake8
black .
isort .
flake8

# Frontend
npm run lint
npm run format

Testing

# Backend
pytest
pytest --cov=app tests/

# Frontend
npm run test
npm run test:coverage

🐳 Docker Support

Development

# Development with hot-reload
docker-compose -f docker-compose.dev.yml up

# Production build
docker-compose -f docker-compose.prod.yml up

Production

# Build images
docker build -t DocChat-backend -f backend/Dockerfile.prod backend/
docker build -t DocChat-frontend -f frontend/Dockerfile.prod frontend/

# Run containers
docker-compose -f docker-compose.prod.yml up -d

πŸ” Troubleshooting

Common Issues

Backend Issues

  1. MongoDB Connection Errors

    # Check MongoDB status
    mongosh
    # Verify connection string in .env
  2. Pinecone API Issues

    • Verify API key and environment
    • Check index name and dimension

Frontend Issues

  1. WebSocket Connection Failed

    • Verify backend is running
    • Check CORS settings
    • Confirm WebSocket URL
  2. Build Failures

    # Clear node modules and reinstall
    rm -rf node_modules
    npm install

πŸ“ˆ Performance

Optimizations

  • Document chunking strategy
  • Vector store indexing
  • Response streaming
  • Frontend caching
  • API rate limiting

Monitoring

  • Prometheus metrics
  • Grafana dashboards
  • Error tracking
  • Usage analytics

πŸ”„ Updates & Migration

Version History

  • v1.0.0 - Initial release
  • v1.1.0 - Added streaming support
  • v1.2.0 - Multiple document handling
  • v2.0.0 - New UI and improved RAG

Migration Guides

About

Intelligent document Q&A powered by RAG technology - transform your documents into interactive knowledge bases

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published