Skip to content

Commit c6cdacb

Browse files
committed
README: Update with bench instructions
1 parent 9b32dfa commit c6cdacb

File tree

2 files changed

+57
-7
lines changed

2 files changed

+57
-7
lines changed

runner/README.md

Lines changed: 57 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,30 @@
11
# runner
22

3-
## Build Docker image
3+
## Architecture
4+
5+
A high level sketch of how the runner is used:
6+
7+
![Architecture](./images/architecture.png)
8+
9+
## Running with Docker
10+
11+
Make sure you have Docker Installed and then pull the pre-built image from DockerHub or build the image locally in this directory.
12+
13+
### Pull Docker image
14+
15+
```
16+
docker pull livepeer/ai-runner:latest
17+
```
18+
19+
### Build Docker image
420

521
```
622
docker build -t livepeer/ai-runner:latest .
723
```
824

9-
## Download models
25+
### Models
1026

11-
The runner app within the container expects model checkpoints to be stored in a `/models` directory which we can mount with a local `models` directory.
27+
The runner app within the container references models by their [HuggingFace](https://huggingface.co/) model ID and expects model checkpoints to be stored in a `/models` directory which we can mount with a local `models` directory.
1228

1329
See the `dl-checkpoints.sh` script for how to download model checkpoints to a local `models` directory.
1430

@@ -19,11 +35,45 @@ pip install "huggingface_hub[cli]"
1935
./dl-checkpoints.sh
2036
```
2137

22-
## Optimizations
38+
### Optimizations
2339

2440
- Set the environment variable `SFAST=true` to enable dynamic compilation with [stable-fast](https://github.com/chengzeyi/stable-fast) to speed up inference for diffusion pipelines (the initial requests will be slower because the model will be dynamically compiled then).
2541

26-
## Run text-to-image container
42+
### Run benchmarking script
43+
44+
```
45+
docker run --gpus <GPU_IDs> -v ./models:/models livepeer/ai-runner:latest python bench.py --pipeline <PIPELINE> --model_id <MODEL_ID> --runs <RUNS> --batch_size <BATCH_SIZE>
46+
```
47+
48+
Example command:
49+
50+
```
51+
# Benchmark the text-to-image pipeline with the stabilityai/sd-turbo model over 3 runs using GPU 0
52+
docker run --gpus 0 -v ./models:/models livepeer/ai-runner:latest python bench.py --pipeline text-to-image --model_id stabilityai/sd-turbo --runs 3
53+
```
54+
55+
Example output:
56+
57+
```
58+
----AGGREGATE METRICS----
59+
60+
61+
pipeline load time: 1.473s
62+
pipeline load max GPU memory allocated: 2.421GiB
63+
pipeline load max GPU memory reserved: 2.488GiB
64+
avg inference time: 0.482s
65+
avg inference time per output: 0.482s
66+
avg inference max GPU memory allocated: 3.024s
67+
avg inference max GPU memory reserved: 3.623s
68+
```
69+
70+
For benchmarking script usage information:
71+
72+
```
73+
docker run livepeer/ai-runner:latest python bench.py -h
74+
```
75+
76+
### Run text-to-image container
2777

2878
Run container:
2979

@@ -37,7 +87,7 @@ Query API:
3787
curl -X POST -H "Content-Type: application/json" localhost:8000/text-to-image -d '{"prompt":"a mountain lion"}'
3888
```
3989

40-
## Run image-to-image container
90+
### Run image-to-image container
4191

4292
Run container:
4393

@@ -51,7 +101,7 @@ Query API:
51101
curl -X POST localhost:8000/image-to-image -F prompt="a mountain lion" -F image=@<IMAGE_FILE>
52102
```
53103

54-
## Run image-to-video container
104+
### Run image-to-video container
55105

56106
Run container
57107

runner/images/architecture.png

427 KB
Loading

0 commit comments

Comments
 (0)