GitHub - InftyAI/llmaz at refs/heads/gh-pages

This branch is 21 commits ahead of, 232 commits behind main.

Name	Name	Last commit message	Last commit date
Latest commit InftyAI-Agent Merge pull request #279 from kerthcet/gh-pages Feb 18, 2025 bab8309 · Feb 18, 2025 History 178 Commits
.gitignore	.gitignore	Release helm v0.0.5	Jan 6, 2025
.nojekyll	.nojekyll	deploy: `139c478`	Sep 11, 2024
README.md	README.md	Upgrade to 0.0.3	Sep 12, 2024
index.yaml	index.yaml	Release helm chart v0.0.7	Feb 18, 2025
llmaz-0.0.1.tgz	llmaz-0.0.1.tgz	deploy: `139c478`	Sep 11, 2024
llmaz-0.0.2.tgz	llmaz-0.0.2.tgz	Add helm chart v0.0.2	Sep 11, 2024
llmaz-0.0.3.tgz	llmaz-0.0.3.tgz	Upgrade to 0.0.3	Sep 12, 2024
llmaz-0.0.4.tgz	llmaz-0.0.4.tgz	Release v0.0.8	Oct 23, 2024
llmaz-0.0.5.tgz	llmaz-0.0.5.tgz	Release helm v0.0.5	Jan 6, 2025
llmaz-0.0.6.tgz	llmaz-0.0.6.tgz	format	Jan 24, 2025
llmaz-0.0.7.tgz	llmaz-0.0.7.tgz	Release helm chart v0.0.7	Feb 18, 2025

Repository files navigation

Easy, advanced inference platform for large language models on Kubernetes

llmaz (pronounced /lima:z/), aims to provide a Production-Ready inference platform for large language models on Kubernetes. It closely integrates with the state-of-the-art inference backends to bring the leading-edge researches to cloud.

🌱 llmaz is alpha now, so API may change before graduating to Beta.

Architecture

Features Overview

Easy of Use: People can quick deploy a LLM service with minimal configurations.
Broad Backend Support: llmaz supports a wide range of advanced inference backends for different scenarios, like vLLM, SGLang, llama.cpp. Find the full list of supported backends here.
Scaling Efficiency (WIP): llmaz works smoothly with autoscaling components like Cluster-Autoscaler or Karpenter to support elastic scenarios.
Accelerator Fungibility (WIP): llmaz supports serving the same LLM with various accelerators to optimize cost and performance.
SOTA Inference: llmaz supports the latest cutting-edge researches like Speculative Decoding or Splitwise(WIP) to run on Kubernetes.
Various Model Providers: llmaz supports a wide range of model providers, such as HuggingFace, ModelScope, ObjectStores(aliyun OSS, more on the way). llmaz automatically handles the model loading requiring no effort from users.
Multi-hosts Support: llmaz supports both single-host and multi-hosts scenarios with LWS from day 1.

How to install

Prerequisites

Kubernetes version >= 1.27
Helm 3

Install a released version

Install

helm repo add inftyai https://inftyai.github.io/llmaz
helm repo update
helm install llmaz inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.3

Uninstall

helm uninstall llmaz
kubectl delete ns llmaz-system

If you want to delete the CRDs as well, run (ignore the error)

kubectl delete crd \
    openmodels.llmaz.io \
    backendruntimes.inference.llmaz.io \
    playgrounds.inference.llmaz.io \
    services.inference.llmaz.io

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Easy, advanced inference platform for large language models on Kubernetes

Architecture

Features Overview

How to install

Prerequisites

Install a released version

Install

Uninstall

About

Releases 11

Packages

Contributors 8

Languages

License

InftyAI/llmaz

Folders and files

Latest commit

History

Repository files navigation

Easy, advanced inference platform for large language models on Kubernetes

Architecture

Features Overview

How to install

Prerequisites

Install a released version

Install

Uninstall

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 11

Packages 0

Contributors 8

Languages

Packages