The benchmarks measure the performance of TorchServe on various models and benchmarks. It supports either a number of built-in models or a custom model passed in as a path or URL to the .mar file. It also runs various benchmarks using these models (see benchmarks section below). The benchmarks are executed in the user machine through a python3 script in case of jmeter and a shell script in case of apache benchmark. TorchServe is run on the same machine in a docker instance to avoid network latencies. The benchmark must be run from within the context of the full TorchServe repo(i.e. the benchmark tests reside inside serve/benchmarks folder).
We currently support benchmarking with JMeter & Apache Bench. One can also profile backend code with snakeviz.
It assumes that you have followed quick start/installation section and have required pre-requisites i.e. python3, java and docker [if needed]. If not then please refer quick start for setup.
We have provided an install_dependencies.sh
script to install everything needed to execute the benchmark on user's Ubuntu environment. First clone the TorchServe repository:
git clone https://github.com/pytorch/serve.git
Now execute this script as below.
On CPU based instance, use ./install_dependencies.sh
.
On GPU based instance, use ./install_dependencies.sh True
.
For mac, you should have python3 and java installed. If you wish to run the default benchmarks featuring a docker-based instance of TorchServe, you will need to install docker as well. Finally, you will need to install jmeter with plugins which can be accomplished by running mac_install_dependencies.sh
.
The benchmarking script requires the following to run:
- python3
- JDK or OpenJDK
- jmeter installed through homebrew or linuxbrew with the plugin manager and the following plugins: jpgc-synthesis=2.1,jpgc-filterresults=2.1,jpgc-mergeresults=2.1,jpgc-cmd=2.1,jpgc-perfmon=2.1
- nvidia-docker
The pre-trained models for the benchmark can be mostly found in the TorchServe model zoo. We currently support the following:
We support several basic benchmarks:
- throughput: Run inference with enough threads to occupy all workers and ensure full saturation of resources to find the throughput. The number of threads defaults to 100.
- latency: Run inference with a single thread to determine the latency
- ping: Test the throughput of pinging against the frontend
- load: Loads the same model many times in parallel. The number of loads is given by the "count" option and defaults to 16.
- repeated_scale_calls: Will scale the model up to "scale_up_workers"=16 then down to "scale_down_workers"=1 then up and down repeatedly.
- multiple_models: Loads and scales up three models (1. squeeze-net and 2. resnet), at the same time, runs inferences on them, and then scales them down. Use the options "urlN", "modelN_name", "dataN" to specify the model url, model name, and the data to pass to the model respectively. data1 and data2 are of the format "'Some garbage data being passed here'" and data3 is the filesystem path to a file to upload.
We also support compound benchmarks:
- concurrent_inference: Runs the basic benchmark with different numbers of threads
- You can specify, docker image using --docker option. You must create docker by following steps given here.
cd serve/benchmarks
./benchmark.py latency -l 1 --docker pytorch/torchserve:0.1.1-cpu
- If you don't specify --ts or --docker then it will use latest image for torchserve on dockerhub and start container by the name of 'ts_benchmark_gpu' or 'ts_benchmark_cpu' depending on whether you have selected --gpus or not
cd serve/benchmarks
./benchmark.py latency -l 1
NOTE - '--docker' and '--ts' are mutually exclusive options
- Install TorchServe using the install guide
- Start TorchServe using following command :
torchserve --start --model-store <path_to_your_model_store>
- To start benchmarking execute following commands
cd serve/benchmarks
python benchmark.py throughput --ts http://127.0.0.1:8080
- Create and start a docker container for TorchServe.
- To start benchmarking execute following commands
cd serve/benchmarks
python benchmark.py throughput --ts http://127.0.0.1:8080
Note:
- Refer the examples below to run different benchmarking suites on TorchServe.
The benchmark reports are available at /tmp/TSBenchmark/
Run basic latency test on default resnet-18 model
./benchmark.py latency
Run basic throughput test on default resnet-18 model.
./benchmark.py throughput
Run all benchmarks
./benchmark.py --all
Run using the squeeze-net model
./benchmark.py latency -m squeezenet1_1
Run on GPU (4 gpus)
./benchmark.py latency -g 4
Run with a custom image
./benchmark.py latency -i {imageFilePath}
Run with a custom model (works only for CNN based models, which accept image as an input for now. We will add support for more input types in future to this command. )
./benchmark.py latency -c {modelUrl} -i {imageFilePath}
Run with custom options
./benchmark.py repeated_scale_calls --options scale_up_workers 100 scale_down_workers 10
Run against an already running instance of TorchServe
./benchmark.py latency --ts 127.0.0.1
(defaults to http, port 80, management port = port + 1)
./benchmark.py latency --ts 127.0.0.1:8080 --management-port 8081
Run with multiple models
./benchmark.py multiple_models
Run verbose with only a single loop
./benchmark.py latency -v -l 1
Using https
instead of http
as the choice of protocol might not work properly currently. This is not a tested option.
./benchmark.py latency --ts https://127.0.0.1:8443
The full list of options can be found by running with the -h or --help flags.
Refer adding a new jmeter test plan for torchserve.
It assumes that you have followed quick start/installation section and have required pre-requisites i.e. python3, java and docker [if needed]. If not then please refer quick start for setup.
pip install -r requirements-ab.txt
- Ubuntu
apt-get install apache2-utils
- macOS
Apache Bench is installed in Mac by default. You can test by running ab -h
This command will run the AB benchmark with default parameters. It will start a Torchserve instance locally, register Resnet-18 model, and run 100 inference requests with a concurrency of 10. Refer parameters section for more details on configurable parameters.
python benchmark-ab.py
The benchmark comes with pre-configured test plans which can be used directly to set parameters. Refer available test plans for more details.
python benchmark-ab.py <test plan>
This command will run Torchserve locally and perform benchmarking on the VGG11 model with test plan soak
test plan soak has been configured with default Resnet-18 model, here we override it by providing extra parameters. Similarly, all parameters can be customized with a Test plan
python benchmark-ab.py soak --url https://torchserve.pytorch.org/mar_files/vgg11.mar
This command will run Torchserve inside a docker container and perform benchmarking with default parameters. The docker image used here is the latest CPU based torchserve image available on the docker hub. The custom image can also be used using the --image
parameter.
python benchmark-ab.py --exec_env docker
This command will run Torchserve inside a docker container with 4 GPUs and perform benchmarking with default parameters. The docker image used here is the latest GPU based torchserve image available on the docker hub. The custom image can also be used using the --image
parameter.
python benchmark-ab.py --exec_env docker --gpus 4
The config parameters can be provided using cmd line args and a config json file as well.
This command will use all the configuration parameters given in config.json file.
python benchmark-ab.py --config config.json
{
"url":"https://torchserve.pytorch.org/mar_files/squeezenet1_1.mar",
"requests": 1000,
"concurrency": 10,
"input": "../examples/image_classifier/kitten.jpg",
"exec_env": "docker",
"gpus": "2"
}
The following parameters can be used to run the AB benchmark suite.
- url: Input model URL. Default: "https://torchserve.pytorch.org/mar_files/squeezenet1_1.mar"
- device: Execution device type. Default: cpu
- exec_env: Execution environment. Default: docker
- concurrency: Concurrency of requests. Default: 10
- requests: Number of requests. Default: 100
- batch_size: The batch size of the model. Default: 1
- batch_delay: Max batch delay of the model. Default:200
- workers: Number of worker thread(s) for model
- input: Input file for model
- content_type: Input file content type.
- image: Custom docker image to run Torchserve on. Default: Standard public Torchserve image
- docker_runtime: Specify docker runtime if required
- ts: Use Already running Torchserve instance. Default: False
- gpus: Number of gpus to run docker container with. By default it runs the docker container on CPU.
- backend_profiling: Enable backend profiling using CProfile. Default: False
- config: All the above params can be set using a config JSON file. When this flag is used, all other cmd line params are ignored.
Benchmark supports pre-defined, pre-configured params that can be selected based on the use case.
- soak: default model url with requests =100000 and concurrency=10
- vgg11_1000r_10c: vgg11 model with requests =1000 and concurrency=10
- vgg11_10000r_100c: vgg11 model with requests =10000 and concurrency=100
- resnet152_batch: Resnet-152 model with batch size = 4, requests =1000 and concurrency=10
- resnet152_batch_docker: Resnet-152 model with batch size = 4, requests =1000, concurrency=10 and execution env = docker
Note: These pre-defined parameters in test plan can be overwritten by cmd line args.
The reports are generated at location "/tmp/benchmark/"
- CSV report: /tmp/benchmark/ab_report.csv
- latency graph: /tmp/benchmark/predict_latency.png
- torhcserve logs: /tmp/benchmark/logs/model_metrics.log
- raw ab output: /tmp/benchmark/result.txt
Benchmark | Model | Concurrency | Requests | TS failed requests | TS throughput | TS latency P50 | TS latency P90 | TS latency P90 | TS latency mean | TS error rate | Model_p50 | Model_p90 | Model_p99 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AB | https://torchserve.pytorch.org/mar_files/squeezenet1_1.mar | 10 | 100 | 0 | 15.66 | 512 | 1191 | 2024 | 638.695 | 0 | 196.57 | 270.9 | 106.53 |
The benchmarks can be used in conjunction with standard profiling tools such as JProfiler to analyze the system performance. JProfiler can be downloaded from their website. Once downloaded, open up JProfiler and follow these steps:
- Run TorchServe directly through gradle (do not use docker). This can be done either on your machine or on a remote machine accessible through SSH.
- In JProfiler, select "Attach" from the ribbon and attach to the ModelServer. The process name in the attach window should be "com.amazonaws.ml.ts.ModelServer". If it is on a remote machine, select "On another computer" in the attach window and enter the SSH details. For the session startup settings, you can leave it with the defaults. At this point, you should see live CPU and Memory Usage data on JProfiler's Telemetries section.
- Select Start Recordings in JProfiler's ribbon
- Run the Benchmark script targeting your running TorchServe instance. It might run something like
./benchmark.py throughput --ts https://127.0.0.1:8443
. It can be run on either your local machine or a remote machine (if you are running remote), but we recommend running the benchmark on the same machine as the model server to avoid confounding network latencies. - Once the benchmark script has finished running, select Stop Recordings in JProfiler's ribbon
Once you have stopped recording, you should be able to analyze the data. One useful section to examine is CPU views > Call Tree and CPU views > Hot Spots to see where the processor time is going.
The benchmarks can also be used to analyze the backend performance using cProfile. To benchmark a backend code,
-
Install Torchserve
Using local TorchServe instance:
- Install TorchServe using the install guide
By using external docker container for TorchServe:
- Create a docker container for TorchServe.
-
Set environment variable and start Torchserve
If using local TorchServe instance:
export TS_BENCHMARK=TRUE torchserve --start --model-store <path_to_your_model_store>
If using external docker container for TorchServe:
- start docker with /tmp directory mapped to local /tmp and set
TS_BENCHMARK
to True.
docker run --rm -it -e TS_BENCHMARK=True -v /tmp:/tmp -p 8080:8080 -p 8081:8081 pytorch/torchserve:latest
- start docker with /tmp directory mapped to local /tmp and set
-
Register a model & perform inference to collect profiling data. This can be done with the benchmark script described in the previous section.
python benchmark.py throughput --ts http://127.0.0.1:8080
-
Visualize SnakeViz results.
To visualize the profiling data using
snakeviz
use following commands:pip install snakeviz snakeviz /tmp/tsPythonProfile.prof
It should start up a web server on your machine and automatically open the page. Note that tha above command will fail if executed on a server where no browser is installed. The backend profiling should generate a visualization similar to the pic shown above.