Skip to content
/ llmaz Public

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

License

Notifications You must be signed in to change notification settings

InftyAI/llmaz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

bab8309 · Feb 18, 2025
Jan 6, 2025
Sep 11, 2024
Sep 12, 2024
Feb 18, 2025
Sep 11, 2024
Sep 11, 2024
Sep 12, 2024
Oct 23, 2024
Jan 6, 2025
Jan 24, 2025
Feb 18, 2025

Repository files navigation

llmaz

Easy, advanced inference platform for large language models on Kubernetes

stability-alpha GoReport Widget Latest Release

llmaz (pronounced /lima:z/), aims to provide a Production-Ready inference platform for large language models on Kubernetes. It closely integrates with the state-of-the-art inference backends to bring the leading-edge researches to cloud.

🌱 llmaz is alpha now, so API may change before graduating to Beta.

Architecture

image

Features Overview

  • Easy of Use: People can quick deploy a LLM service with minimal configurations.
  • Broad Backend Support: llmaz supports a wide range of advanced inference backends for different scenarios, like vLLM, SGLang, llama.cpp. Find the full list of supported backends here.
  • Scaling Efficiency (WIP): llmaz works smoothly with autoscaling components like Cluster-Autoscaler or Karpenter to support elastic scenarios.
  • Accelerator Fungibility (WIP): llmaz supports serving the same LLM with various accelerators to optimize cost and performance.
  • SOTA Inference: llmaz supports the latest cutting-edge researches like Speculative Decoding or Splitwise(WIP) to run on Kubernetes.
  • Various Model Providers: llmaz supports a wide range of model providers, such as HuggingFace, ModelScope, ObjectStores(aliyun OSS, more on the way). llmaz automatically handles the model loading requiring no effort from users.
  • Multi-hosts Support: llmaz supports both single-host and multi-hosts scenarios with LWS from day 1.

How to install

Prerequisites

  • Kubernetes version >= 1.27
  • Helm 3

Install a released version

Install

helm repo add inftyai https://inftyai.github.io/llmaz
helm repo update
helm install llmaz inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.3

Uninstall

helm uninstall llmaz
kubectl delete ns llmaz-system

If you want to delete the CRDs as well, run (ignore the error)

kubectl delete crd \
    openmodels.llmaz.io \
    backendruntimes.inference.llmaz.io \
    playgrounds.inference.llmaz.io \
    services.inference.llmaz.io