Skip to content

A chatbot to help staff identify and use correct sensitivity labels in communications. Built with Python, Gradio, Pandoc, Tesseract OCR, and OpenAI.

License

Notifications You must be signed in to change notification settings

chweekueh1/nyp-fyp-project

 
 

Repository files navigation

NYP FYP CNC Chatbot

A chatbot to help staff identify and use correct sensitivity labels in communications. Built with Python, Gradio, Pandoc, Tesseract OCR, and OpenAI.

The software is in a beta state, expect bugs quirks and potentially broken code.

🚀 Quick Start

Recommended: Use Docker and Docker Compose for setup and running.

RTFM at https://www.docker.com/ if not sure

Prerequisites

  • Docker and Docker Compose (v2+)
  • OpenAI API key (add to .env)

See https://platform.openai.com/api-keys

  • (For local dev: Python 3.11+ and Git)
  • Setup & Run (Docker, Docker Compose)
git clone https://github.com/chweekueh1/nyp-fyp-project
cd nyp-fyp-project
cp .env.dev .env   # Add your OpenAI API key to .env
python setup.py --docker-build
python setup.py --docker-run

setup.py is just a wrapper over Docker commands, so run them directly if you are unable to run the setup script on Windows.

Note that certain paths in the source code are hard coded.

🐳 Docker & Multi-Container

Uses separate containers for dev, test, prod, and docs. Requires Docker Compose for multi-container workflows and benchmarks. See Docker Compose install.

Common commands:

python setup.py --docker-build         # Build dev container
python setup.py --docker-run           # Run app
python setup.py --docker-test          # Run tests
python setup.py --docs                 # Build & serve docs (http://localhost:8080)

Note that the sites are currently exposed by nginx reverse proxy (generated by Gradio), which is exposed on http://0.0.0.0:7680 -> site_url. Documentation and other Docker containers may use other ports.

🧪 Testing

To be implemented

📁 Data Storage

User data is stored in ~/.nypai-chatbot/ (local) or /home/appuser/.nypai-chatbot/ (Docker).

You would need to create the following under the project root since we are currently using a volume mount:

|-- data
|---- cache
|---- memory_persistence
|---- reports
|---- vector_store

📚 Documentation

Build and serve docs:

python setup.py --docs

Docs available at http://127.0.0.1:8080

Technical detail: this just grabs docstrings and renders it in Sphinx.

⏳️ Benchmarking

Benchmarks for various function and API calls in the codebase can be triggered via:

python setup.py --run-benchmarks

It will output to the <project root>/data directory as benchmark.md once complete. This directory also has a JSON and SQLITE file recording Docker build details.

🔧 Code Quality

Pre-commit hooks with ruff for linting and formatting:

Note: The pre-commit flag in the setup script might not work depending on the directory you are in when you invoke the script.

If this is the case, make use of these steps instead

Activate and create Python virtual environment named .venv and activate it. See Python Docs for more on virtual environments.

pip install -r requirements/requirements-precommit.txt

Then you can run git commit and git push within the context of the virtual environment and it will automatically run the configure pre-commit hooks.

You can also manually run the pre-commit hooks at any time. See here for details.

🐛 Troubleshooting

API Key Issues: Check .env and your OpenAI API key. Port Conflicts: Default is 7860; Gradio will use the next available port. Dependencies: Pandoc, ffmpeg, hyperfine (handled by Docker).

📝 License

MIT License. See LICENSE for details.

About

A chatbot to help staff identify and use correct sensitivity labels in communications. Built with Python, Gradio, Pandoc, Tesseract OCR, and OpenAI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.2%
  • CSS 2.7%
  • Makefile 0.1%