Skip to content

Commit b4340c4

Browse files
authored
Add batcher module. (kserve#847)
* Add batcher module. * Add batcher module. * Add batcher e2e test timeout. * Add batcher e2e test timeout. * Some changes according to PR comments. * Enlarge batcher resource. * Change image name of batcher for CI. * Change batcher resource. * Changes according to comments of PR. * Add vendor/github.com/astaxie/beego vendor/github.com/satori/go.uuid vendor/github.com/shiena/ansicolor. * Add release/triggers/latest/batcher.yaml release/triggers/tagged/batcher.yaml. * Change mode of test/scripts/build-batcher.sh. * Change mode of test/scripts/build-batcher.sh. * Change mode of test/scripts/build-batcher.sh. * Fix import time issue and change the resource of batcher. * Add dependencies. * Add dependencies. * Fix pkg/webhook/admission/pod/batcher_injector_test.go issue. * delete vendor. * Change batcher.Dockerfile. * Change test/scripts/build-batcher.sh prow_config.yaml. * Change test/scripts/build-batcher.sh prow_config.yaml. * Change batcher README.md. * Change batcher README.md. * Change batcher container name to batcher. * Change batcher README.md. * Change batcher README.md. * Change license. * Change the type of maxBatchSize, maxLatency, timeout to int type. * Fix addBatcherAnnotations issue.
1 parent 0dfe926 commit b4340c4

File tree

42 files changed

+4908
-11
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+4908
-11
lines changed

Makefile

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ HAS_LINT := $(shell command -v golint;)
33
# Image URL to use all building/pushing image targets
44
IMG ?= kfserving-controller:latest
55
LOGGER_IMG ?= logger:latest
6+
BATCHER_IMG ?= batcher:latest
67
SKLEARN_IMG ?= sklearnserver:latest
78
XGB_IMG ?= xgbserver:latest
89
PYTORCH_IMG ?= pytorchserver:latest
@@ -17,7 +18,7 @@ KFSERVING_CONTROLLER_MEMORY_LIMIT ?= 300Mi
1718
$(shell perl -pi -e 's/cpu:.*/cpu: $(KFSERVING_CONTROLLER_CPU_LIMIT)/' config/default/manager_resources_patch.yaml)
1819
$(shell perl -pi -e 's/memory:.*/memory: $(KFSERVING_CONTROLLER_MEMORY_LIMIT)/' config/default/manager_resources_patch.yaml)
1920

20-
all: test manager logger
21+
all: test manager logger batcher
2122

2223
# Run tests
2324
test: fmt vet lint manifests
@@ -27,10 +28,14 @@ test: fmt vet lint manifests
2728
manager: generate fmt vet lint
2829
go build -o bin/manager ./cmd/manager
2930

30-
# Build manager binary
31+
# Build logger binary
3132
logger: fmt vet
3233
go build -o bin/logger ./cmd/logger
3334

35+
# Build batcher binary
36+
batcher: fmt vet
37+
go build -o bin/batcher ./cmd/batcher
38+
3439
# Run against the configured Kubernetes cluster in ~/.kube/config
3540
run: generate fmt vet lint
3641
go run ./cmd/manager/main.go
@@ -138,6 +143,12 @@ docker-build-logger: test
138143
docker-push-logger:
139144
docker push ${LOGGER_IMG}
140145

146+
docker-build-batcher:
147+
docker build -f batcher.Dockerfile . -t ${BATCHER_IMG}
148+
149+
docker-push-batcher:
150+
docker push ${BATCHER_IMG}
151+
141152
docker-build-sklearn:
142153
cd python && docker build -t ${KO_DOCKER_REPO}/${SKLEARN_IMG} -f sklearn.Dockerfile .
143154

batcher.Dockerfile

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Build the inference-batcher binary
2+
FROM golang:1.13.0 as builder
3+
4+
# Copy in the go src
5+
WORKDIR /go/src/github.com/kubeflow/kfserving
6+
COPY pkg/ pkg/
7+
COPY cmd/ cmd/
8+
COPY go.mod go.mod
9+
COPY go.sum go.sum
10+
11+
RUN go mod download
12+
13+
# Build
14+
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o batcher ./cmd/batcher
15+
16+
# Copy the inference-batcher into a thin image
17+
FROM gcr.io/distroless/static:latest
18+
COPY third_party/ third_party/
19+
WORKDIR /
20+
COPY --from=builder /go/src/github.com/kubeflow/kfserving/batcher .
21+
ENTRYPOINT ["/batcher"]

cmd/batcher/main.go

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
/*
2+
Copyright 2020 kubeflow.org.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package main
18+
19+
import (
20+
"errors"
21+
"flag"
22+
"os"
23+
"strconv"
24+
"github.com/kubeflow/kfserving/pkg/batcher"
25+
"github.com/kubeflow/kfserving/pkg/batcher/controllers"
26+
logf "sigs.k8s.io/controller-runtime/pkg/runtime/log"
27+
)
28+
29+
var (
30+
port = flag.String("port", "9082", "Batcher port")
31+
componentHost = flag.String("component-host", "127.0.0.1", "Component host")
32+
componentPort = flag.String("component-port", "8080", "Component port")
33+
maxBatchSize = flag.String("max-batchsize", "32", "Max Batch Size")
34+
maxLatency = flag.String("max-latency", "5000", "Max Latency in milliseconds")
35+
timeout = flag.String("timeout", "60", "Timeout of calling predictor service in seconds")
36+
)
37+
38+
func main() {
39+
flag.Parse()
40+
41+
logf.SetLogger(logf.ZapLogger(false))
42+
log := logf.Log.WithName("entrypoint")
43+
44+
maxBatchSizeInt, err := strconv.Atoi(*maxBatchSize)
45+
if err != nil || maxBatchSizeInt <= 0 {
46+
log.Error(errors.New("Invalid max batch size"), *maxBatchSize)
47+
os.Exit(1)
48+
}
49+
50+
maxLatencyInt, err := strconv.Atoi(*maxLatency)
51+
if err != nil || maxLatencyInt <= 0 {
52+
log.Error(errors.New("Invalid max latency"), *maxLatency)
53+
os.Exit(1)
54+
}
55+
56+
timeoutInt, err := strconv.Atoi(*timeout)
57+
if err != nil || timeoutInt <= 0 {
58+
log.Error(errors.New("Invalid timeout"), *timeout)
59+
os.Exit(1)
60+
}
61+
62+
controllers.Config(*port, *componentHost, *componentPort, maxBatchSizeInt, maxLatencyInt, timeoutInt)
63+
64+
log.Info("Starting", "Port", *port)
65+
batcher.StartHttpServer()
66+
}

config/default/configmap/inferenceservice.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,3 +103,11 @@ data:
103103
"cpuRequest": "100m",
104104
"cpuLimit": "1"
105105
}
106+
batcher: |-
107+
{
108+
"image" : "gcr.io/kfserving/batcher:latest",
109+
"memoryRequest": "1Gi",
110+
"memoryLimit": "1Gi",
111+
"cpuRequest": "1",
112+
"cpuLimit": "1"
113+
}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
image: github.com/kubeflow/kfserving/cmd/batcher

config/overlays/test/configmap/inferenceservice.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,3 +103,11 @@ data:
103103
"cpuRequest": "100m",
104104
"cpuLimit": "1"
105105
}
106+
batcher: |-
107+
{
108+
"image" : "gcr.io/kubeflow-ci/kfserving/batcher",
109+
"memoryRequest": "1Gi",
110+
"memoryLimit": "1Gi",
111+
"cpuRequest": "1",
112+
"cpuLimit": "1"
113+
}

docs/diagrams/batcher.jpg

100 KB
Loading

docs/samples/batcher/README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Inference Batcher
2+
3+
We add this module to support batching predict for any ML frameworks (TensorFlow, PyTorch, ...) without decreasing the performance.
4+
5+
This batcher module also support customized images.
6+
7+
![Batcher](../../diagrams/batcher.jpg)
8+
9+
* We use webhook to inject the batcher container into the InferenceService.
10+
* We choose Beego web framework to accept user requests.
11+
* We use channels to transfer data between go routines.
12+
* We use HTTP connections between containers. In the future, we may use RPC.
13+
* Users can choose to use the batcher or not by changing the yaml file of InferenceService.
14+
* When the number of instances (For example, the number of pictures) reaches the maxBatchSize or the latency meets the maxLatency, a batching predict will be triggered.
15+
```
16+
apiVersion: "serving.kubeflow.org/v1alpha2"
17+
kind: "InferenceService"
18+
metadata:
19+
name: "pytorch-cifar10"
20+
spec:
21+
default:
22+
predictor:
23+
minReplicas: 1
24+
batcher:
25+
maxBatchSize: 32
26+
maxLatency: 5000
27+
timeout: 60
28+
pytorch:
29+
storageUri: "gs://kfserving-samples/models/pytorch/cifar10/"
30+
```
31+
* port: the port of batcher container.
32+
* maxBatchSize: the max batch size for predict.
33+
* maxLatency: the max latency for predict (In milliseconds).
34+
* timeout: timeout of calling predictor service (In seconds).
35+
36+
All of the bellowing fields have default values in the code. You can config them or not as you wish.
37+
* maxBatchSize: 32.
38+
* maxLatency: 5000.
39+
* timeout: 60.

docs/samples/batcher/basic/README.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Inference Batcher Demo
2+
3+
We first create a pytorch predictor with a batcher. The "maxLatency" is set to a big value (5000 milliseconds) to make us be able to observe the batching process.
4+
5+
```
6+
apiVersion: "serving.kubeflow.org/v1alpha2"
7+
kind: "InferenceService"
8+
metadata:
9+
name: "pytorch-cifar10"
10+
spec:
11+
default:
12+
predictor:
13+
minReplicas: 1
14+
batcher:
15+
maxBatchSize: 32
16+
maxLatency: 5000
17+
timeout: 60
18+
pytorch:
19+
storageUri: "gs://kfserving-samples/models/pytorch/cifar10/"
20+
```
21+
22+
Let's apply this yaml:
23+
24+
```
25+
kubectl create -f pytorch-batcher.yaml
26+
```
27+
28+
We can now send requests to the pytorch model using hey.
29+
30+
```
31+
MODEL_NAME=pytorch-cifar10
32+
INPUT_PATH=@./input.json
33+
INGRESS_GATEWAY=istio-ingressgateway
34+
CLUSTER_IP=$(kubectl -n istio-system get service $INGRESS_GATEWAY -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
35+
SERVICE_HOSTNAME=$(kubectl get inferenceservice pytorch-cifar10 -o jsonpath='{.status.url}' | cut -d "/" -f 3)
36+
hey -z 10s -c 5 -m POST -host "${SERVICE_HOSTNAME}" -H "Content-Type: application/json" -D ./input.json "http://$CLUSTER_IP/v1/models/$MODEL_NAME:predict"
37+
```
38+
39+
The request will go to the batcher container first, and then the batcher container will do batching and send the batching request to the predictor container.
40+
41+
Notice: If the interval of sending the two requests is less than "maxLatency", the returned "batchId" will be the same.
42+
43+
Expected Output for each ssh terminal tab.
44+
45+
```
46+
Summary:
47+
Total: 10.6268 secs
48+
Slowest: 1.6477 secs
49+
Fastest: 0.0050 secs
50+
Average: 0.1006 secs
51+
Requests/sec: 48.1800
52+
53+
Total data: 167424 bytes
54+
Size/request: 327 bytes
55+
56+
Response time histogram:
57+
0.005 [1] |
58+
0.169 [447] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
59+
0.334 [30] |■■■
60+
0.498 [7] |■
61+
0.662 [10] |■
62+
0.826 [3] |
63+
0.991 [6] |■
64+
1.155 [5] |
65+
1.319 [1] |
66+
1.483 [1] |
67+
1.648 [1] |
68+
69+
70+
Latency distribution:
71+
10% in 0.0079 secs
72+
25% in 0.0114 secs
73+
50% in 0.0398 secs
74+
75% in 0.0867 secs
75+
90% in 0.2029 secs
76+
95% in 0.5170 secs
77+
99% in 1.1428 secs
78+
79+
Details (average, fastest, slowest):
80+
DNS+dialup: 0.0000 secs, 0.0050 secs, 1.6477 secs
81+
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
82+
req write: 0.0002 secs, 0.0001 secs, 0.0004 secs
83+
resp wait: 0.1000 secs, 0.0046 secs, 1.6473 secs
84+
resp read: 0.0003 secs, 0.0000 secs, 0.0620 secs
85+
86+
Status code distribution:
87+
[200] 512 responses
88+
```

0 commit comments

Comments
 (0)