This repository provides Docker containers for TorchServe (an inference server for PyTorch models) for multiple hardware platforms.
For an example use, see TorchServe's examples.
Instead of starting torchserve as given in the example, start it with a docker command, as follows for your platform (where your model is stored as a .mar
file, in a model_store
subdirectory of the directory where you are starting torchserve).
As of torchserve v11, you will need to add setuptools==69.5.1
to your requirements.txt
when generating a MAR file:
As of torchserve v11.1, token authorization is on. Add --disable-token-auth
to turn it off.
- iqtlabs/torchserve: CPU only, for arm64 (includes Pi4 and Apple) and amd64.
docker run -v $(pwd)/model_store:/model_store -p 8080:8080 --rm --name torchserve -d iqtlabs/torchserve --models <model>=<model>.mar
- iqtlabs/cuda-torchserve: CUDA (12.5 or later) accelerated for amd64 only.
docker run --gpus all -v $(pwd)/model_store:/model_store -p 8080:8080 --rm --name torchserve -d iqtlabs/cuda-torchserve --models <model>=<model>.mar
- iqtlabs/orin-torchserve: Jetson Orin, JetPack 5.1.2 or later, arm64 only.
docker run --runtime nvidia -v $(pwd)/model_store:/model_store -p 8080:8080 --rm --name torchserve -d iqtlabs/orin-torchserve --models <model>=<model>.mar
Currently, Docker does not support access to Apple MPS devices, so inference will be CPU only. However, PyTorch itself does support MPS, and so TorchServe could be run with MPS support outside a Docker container.