Skip to content

Aesthisia/LLMinator

Repository files navigation

LLMinator: Run & Test LLMs directly from HuggingFace

Gradio based tool with integrated chatbot to locally run & test LLMs directly from HuggingFace.

An easy-to-use tool made with Gradio, LangChain, and Torch.

LLMinator chat tab

LLMinator models tab

⚡ Features

  • Context-aware Streaming Chatbot.
  • Inbuilt code syntax highlighting.
  • Load any LLM repo directly from HuggingFace.
  • Supports both CPU & CUDA modes.
  • Enable LLM inference with llama.cpp using llama-cpp-python
  • Convert models(Safetensors, pt to gguf etc)
  • Customize LLM inference parameters(n_gpu_layers, temperature, max_tokens etc)
  • Real-time text generation via websockets, enabling seamless integration with different frontend frameworks.

🚀 Installation

To use LLMinator, follow these simple steps:

Clone the LLMinator repository from GitHub & install requirements

```
git clone https://github.com/Aesthisia/LLMinator.git
cd LLMinator
pip install -r requirements.txt
```

Build LLMinator with llama.cpp:

  • Using make:

    • On Linux or MacOS:

      make
    • On Windows:

      1. Download the latest fortran version of w64devkit.
      2. Extract w64devkit on your pc.
      3. Run w64devkit.exe.
      4. Use the cd command to reach the LLMinator folder.
      5. From here you can run:
        make
  • Using CMake:

    mkdir build
    cd build
    cmake ..

Launch LLMinator on browser

  • Run the LLMinator tool using the command python webui.py.
  • Access the web interface by opening the http://127.0.0.1:7860 in your browser.
  • Start interacting with the chatbot and experimenting with LLMs!

Checkout this youtube video to follow installation steps

Command line arguments

Argument Command Default Description
--host 127.0.0.1 Host or IP address on which the server will listen for incoming connections
--port 7860 Launch gradio with given server port
--share False This generates a public shareable link that you can send to anybody

Connect to WebSocket for generation

Connect to ws://localhost:7861/ for real-time text generation. Submit prompts and receive responses through the websocket connection.

Integration with Frontends:

The provided example/index.html demonstrates basic usage of text generation through websocket connection. You can integrate it with any frontend framework like React.js

Installation and Development Tips

Python Version

  • Compatible Versions: This project is compatible with Python versions 3.8+ to 3.11. Ensure you have one of these versions installed on your system. You can check your Python version by running python --version or python3 --version in your terminal.

Cmake and C Compiler

  • Cmake Dependency: If you plan to build the project using Cmake, make sure you have Cmake installed.
  • C Compiler: Additionally, you'll need a C compiler such as GCC. These are typically included with most Linux distributions. You can check this by running gcc --version in your terminal. Installation instructions for your specific operating system can be found online.

Visual Studio Code

  • Visual Studio Installer: If you're using Visual Studio Code for development, you'll need the C++ development workload installed. You can achieve this through the Visual Studio Installer

GPU Acceleration (CUDA):

  • CUDA Installation: To leverage GPU acceleration, you'll need CUDA installed on your system. Download instructions are available on the NVIDIA website.
  • Torch Compatibility: After installing CUDA, confirm CUDA availability with torch.cuda.is_available(). When using a GPU, ensure you follow the project's specific llama-cpp-python installation configuration for CUDA support.

Reporting Issues:

If you encounter any errors or issues, feel free to file a detailed report in the project's repository. We're always happy to help! When reporting an issue, please provide as much information as possible, including the error message, logs, the steps you took, and your system configuration. This makes it easier for us to diagnose and fix the problem quickly.

🤝 Contributions

We welcome contributions from the community to enhance LLMinator further. If you'd like to contribute, please follow these guidelines:

  • Fork the LLMinator repository on GitHub.
  • Create a new branch for your feature or bug fix.
  • Test your changes thoroughly.
  • Submit a pull request, providing a clear description of the changes you've made.

Reach out to us: [email protected]