Skip to content

Commit 878203b

Browse files
Merge pull request #30 from kerthcet/feat/image-support
Release v0.0.1
2 parents 3110c05 + 865fdfc commit 878203b

File tree

4 files changed

+107
-18
lines changed

4 files changed

+107
-18
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,4 @@ Dockerfile.cross
2525
*.swo
2626
*~
2727
.DS_Store
28+
artifacts

Makefile

+10
Original file line numberDiff line numberDiff line change
@@ -267,3 +267,13 @@ $(CONTROLLER_GEN): $(LOCALBIN)
267267
envtest: $(ENVTEST) ## Download envtest-setup locally if necessary.
268268
$(ENVTEST): $(LOCALBIN)
269269
test -s $(LOCALBIN)/setup-envtest || GOBIN=$(LOCALBIN) go install sigs.k8s.io/controller-runtime/tools/setup-envtest@latest
270+
271+
##@Release
272+
273+
.PHONY: artifacts
274+
artifacts: kustomize
275+
cd config/manager && $(KUSTOMIZE) edit set image controller=${IMG}
276+
if [ -d artifacts ]; then rm -rf artifacts; fi
277+
mkdir -p artifacts
278+
$(KUSTOMIZE) build config/default -o artifacts/manifests.yaml
279+
@$(call clean-manifests)

README.md

+49-18
Original file line numberDiff line numberDiff line change
@@ -7,25 +7,31 @@
77
[GoReport Widget]: https://goreportcard.com/badge/github.com/inftyai/llmaz
88
[GoReport Status]: https://goreportcard.com/report/github.com/inftyai/llmaz
99

10-
llmaz, pronounced as `/lima:z/`, aims to provide a production-ready inference platform for large language models on Kubernetes. It tightly integrates with state-of-the-art inference backends, such as [vLLM](https://github.com/vllm-project/vllm).
10+
**llmaz** (pronounced `/lima:z/`), aims to provide a **Production-Ready** inference platform for large language models on **Kubernetes**. It closely integrates with state-of-the-art inference backends like [vLLM](https://github.com/vllm-project/vllm) to bring the cutting-edge researches to cloud.
1111

1212
## Concept
1313

1414
![image](./docs/assets/overview.png)
1515

1616
## Feature Overview
1717

18-
- **Easy to use**: People can deploy a production-ready LLM service with minimal configurations.
19-
- **High performance**: llmaz integrates with vLLM by default for high performance inference. Other backend supports are on the way.
20-
- **Autoscaling efficiency**: llmaz works smoothly with autoscaling components like [cluster-autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) and [Karpenter](https://github.com/kubernetes-sigs/karpenter) to support elastic scenarios.
21-
- **Accelerator fungibility**: llmaz supports serving LLMs with different accelerators for the sake of cost and performance.
22-
- **SOTA inference technologies**: llmaz support the latest SOTA technologies like [speculative decoding](https://arxiv.org/abs/2211.17192) and [Splitwise](https://arxiv.org/abs/2311.18677).
18+
- **User Friendly**: People can quick deploy a LLM service with minimal configurations.
19+
- **High Performance**: llmaz integrates with vLLM by default for high performance inference. Other backends support are on the way.
20+
- **Scaling Efficiency**: llmaz works smoothly with autoscaling components like [cluster-autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) or [Karpenter](https://github.com/kubernetes-sigs/karpenter) to support elastic cases.
21+
- **Accelerator Fungibility**: llmaz supports serving the same LLMs with various accelerators to optimize cost and performance.
22+
- **SOTA Inference**: llmaz support the latest cutting-edge researches like [Speculative Decoding](https://arxiv.org/abs/2211.17192) and [Splitwise](https://arxiv.org/abs/2311.18677).
2323

2424
## Quick Start
2525

26-
Once `Model`s (e.g. opt-125m) published, you can quick deploy a `Playground` for serving.
26+
### Installation
2727

28-
### Model
28+
Read the [Installation](./docs/installation.md) for guidance.
29+
30+
### Deploy
31+
32+
Once `Model`s (e.g. facebook/opt-125m) are published, you can quick deploy a `Playground` to serve the model.
33+
34+
#### Model
2935

3036
```yaml
3137
apiVersion: llmaz.io/v1alpha1
@@ -37,12 +43,12 @@ spec:
3743
dataSource:
3844
modelID: facebook/opt-125m
3945
inferenceFlavors:
40-
- name: t4
46+
- name: t4 # GPU type
4147
requests:
4248
nvidia.com/gpu: 1
4349
```
4450
45-
### Inference Playground
51+
#### Inference Playground
4652
4753
```yaml
4854
apiVersion: inference.llmaz.io/v1alpha1
@@ -55,16 +61,41 @@ spec:
5561
modelName: opt-125m
5662
```
5763
58-
Refer to more **[Examples](/docs/examples/README.md)** for references.
64+
### Test
65+
66+
#### Expose the service
67+
68+
```cmd
69+
kubectl port-forward pod/opt-125m-0 8080:8080
70+
```
71+
72+
#### See registered models
73+
74+
```cmd
75+
curl http://localhost:8080/v1/models
76+
```
77+
78+
#### Request a query
79+
80+
```cmd
81+
curl http://localhost:8080/v1/completions \
82+
-H "Content-Type: application/json" \
83+
-d '{
84+
"model": "facebook/opt-125m",
85+
"prompt": "San Francisco is a",
86+
"max_tokens": 10,
87+
"temperature": 0
88+
}'
89+
```
90+
91+
Refer to **[examples](/docs/examples/README.md)** to learn more.
5992

6093
## Roadmap
6194

62-
- Metrics support
63-
- Autoscaling support
64-
- Gateway support
65-
- Serverless support
66-
- CLI tool
67-
- Model training, fine tuning in the long-term.
95+
- Gateway support for traffic routing
96+
- Serverless support for cloud-agnostic users
97+
- CLI tool support
98+
- Model training, fine tuning in the long-term
6899

69100
## Contributions
70101

@@ -76,4 +107,4 @@ Refer to more **[Examples](/docs/examples/README.md)** for references.
76107

77108
<a href="https://github.com/InftyAI/llmaz/graphs/contributors">
78109
<img src="https://contrib.rocks/image?repo=InftyAI/llmaz" />
79-
</a>
110+
</a>

docs/installation.md

+47
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Installation Guide
2+
3+
## Prerequisites
4+
5+
* Kubernetes version >= 1.27
6+
7+
## Install a released version
8+
9+
### Install
10+
11+
```cmd
12+
# leaderworkerset runs in lws-system
13+
LWS_VERSION=v0.3.0
14+
kubectl apply --server-side -f https://github.com/kubernetes-sigs/lws/releases/download/$LWS_VERSION/manifests.yaml
15+
16+
# llmaz runs in llmaz-system
17+
LLMAZ_VERSION=v0.0.1
18+
kubectl apply --server-side -f https://github.com/inftyai/llmaz/releases/download/$LLMAZ_VERSION/manifests.yaml
19+
```
20+
21+
### Uninstall
22+
23+
```cmd
24+
LWS_VERSION=v0.3.0
25+
kubectl delete -f https://github.com/kubernetes-sigs/lws/releases/download/$LWS_VERSION/manifests.yaml
26+
27+
LLMAZ_VERSION=v0.0.1
28+
kubectl delete -f https://github.com/inftyai/llmaz/releases/download/$LLMAZ_VERSION/manifests.yaml
29+
```
30+
31+
## Install from source
32+
33+
### Install
34+
35+
```cmd
36+
LWS_VERSION=v0.3.0
37+
kubectl apply --server-side -f https://github.com/kubernetes-sigs/lws/releases/download/$LWS_VERSION/manifests.yaml
38+
39+
git clone https://github.com/inftyai/llmaz.git && cd llmaz
40+
IMG=<IMAGE_REPO>:<GIT_TAG> make image-push deploy
41+
```
42+
43+
### Uninstall
44+
45+
```cmd
46+
make undeploy
47+
```

0 commit comments

Comments
 (0)