diff --git a/runner/README.md b/runner/README.md index 9196b2c81..95cb0039a 100644 --- a/runner/README.md +++ b/runner/README.md @@ -1,14 +1,30 @@ # runner -## Build Docker image +## Architecture + +A high level sketch of how the runner is used: + +![Architecture](./images/architecture.png) + +## Running with Docker + +Make sure you have Docker Installed and then pull the pre-built image from DockerHub or build the image locally in this directory. + +### Pull Docker image + +``` +docker pull livepeer/ai-runner:latest +``` + +### Build Docker image ``` docker build -t livepeer/ai-runner:latest . ``` -## Download models +### Models -The runner app within the container expects model checkpoints to be stored in a `/models` directory which we can mount with a local `models` directory. +The runner app within the container references models by their [HuggingFace](https://huggingface.co/) model ID and expects model checkpoints to be stored in a `/models` directory which we can mount with a local `models` directory. See the `dl-checkpoints.sh` script for how to download model checkpoints to a local `models` directory. @@ -19,11 +35,45 @@ pip install "huggingface_hub[cli]" ./dl-checkpoints.sh ``` -## Optimizations +### Optimizations - Set the environment variable `SFAST=true` to enable dynamic compilation with [stable-fast](https://github.com/chengzeyi/stable-fast) to speed up inference for diffusion pipelines (the initial requests will be slower because the model will be dynamically compiled then). -## Run text-to-image container +### Run benchmarking script + +``` +docker run --gpus -v ./models:/models livepeer/ai-runner:latest python bench.py --pipeline --model_id --runs --batch_size +``` + +Example command: + +``` +# Benchmark the text-to-image pipeline with the stabilityai/sd-turbo model over 3 runs using GPU 0 +docker run --gpus 0 -v ./models:/models livepeer/ai-runner:latest python bench.py --pipeline text-to-image --model_id stabilityai/sd-turbo --runs 3 +``` + +Example output: + +``` +----AGGREGATE METRICS---- + + +pipeline load time: 1.473s +pipeline load max GPU memory allocated: 2.421GiB +pipeline load max GPU memory reserved: 2.488GiB +avg inference time: 0.482s +avg inference time per output: 0.482s +avg inference max GPU memory allocated: 3.024s +avg inference max GPU memory reserved: 3.623s +``` + +For benchmarking script usage information: + +``` +docker run livepeer/ai-runner:latest python bench.py -h +``` + +### Run text-to-image container Run container: @@ -37,7 +87,7 @@ Query API: curl -X POST -H "Content-Type: application/json" localhost:8000/text-to-image -d '{"prompt":"a mountain lion"}' ``` -## Run image-to-image container +### Run image-to-image container Run container: @@ -51,7 +101,7 @@ Query API: curl -X POST localhost:8000/image-to-image -F prompt="a mountain lion" -F image=@ ``` -## Run image-to-video container +### Run image-to-video container Run container diff --git a/runner/images/architecture.png b/runner/images/architecture.png new file mode 100644 index 000000000..69805413b Binary files /dev/null and b/runner/images/architecture.png differ