This demo illustrates how to run inference using Triton Inference Server. It performs the following tasks:
- Preprocessing an input image (loading, resizing, normalizing, and reformatting for model input).
- Inference by sending the preprocessed image to Triton using two models:
- Inception-v3 (expects 299x299 images)
- ResNet-50 (expects 224x224 images)
- Postprocessing of the model output to compute softmax probabilities and display the top-5 predictions.
This setup demonstrates how Triton can be integrated into a production-style pipeline for computer vision tasks.
- Python 3.10
- Docker (for running Triton Inference Server)
- The required Python packages are listed in
req.txt
Create a Python 3.10 virtual environment by running:
python3.10 -m venv venv
Activate the virtual environment:
- On Unix or macOS:
source venv/bin/activate
- On Windows:
venv\Scripts\activate
Install the necessary dependencies using:
python3.10 -m pip install -r req.txt
Download or setup the models by running the following scripts. These scripts should download and organize the models into the correct directory (typically in a folder named models
):
python3.10 inception_v3.py
python3.10 resnet50.py
Ensure Docker is installed then start the Triton server using:
docker run --platform linux/amd64 --rm -p8000:8000 -p8001:8001 -p8002:8002 \
-v "$(pwd)/models":/models nvcr.io/nvidia/tritonserver:23.03-py3 \
tritonserver --model-repository=/models
This command mounts your local models
directory to Triton’s /models
repository.
With the Triton server running, execute the inference script:
python3.10 script.py
Note: Make sure that the file example.jpg
used in script.py
exists in your working directory, or update the image_path
variable in the script accordingly.
The demo in script.py
covers:
- Image Preprocessing: Converts images to the input format required by the models.
- Inference Communication: Uses Triton’s HTTP client to send inference requests.
- Postprocessing: Applies softmax to compute probabilities and extracts the top-5 predictions.
- Model Comparison: Demonstrates running inference with two different models (Inception-v3 and ResNet-50) using the same image input.
This example serves as a starting point for working with the Triton Inference Server in production environments, helping you integrate model serving into your own applications.