Triton Inference Demo

This demo illustrates how to run inference using Triton Inference Server. It performs the following tasks:

Preprocessing an input image (loading, resizing, normalizing, and reformatting for model input).
Inference by sending the preprocessed image to Triton using two models:
- Inception-v3 (expects 299x299 images)
- ResNet-50 (expects 224x224 images)
Postprocessing of the model output to compute softmax probabilities and display the top-5 predictions.

This setup demonstrates how Triton can be integrated into a production-style pipeline for computer vision tasks.

Prerequisites

Python 3.10
Docker (for running Triton Inference Server)
The required Python packages are listed in req.txt

Setup Instructions

1. Create a Virtual Environment

Create a Python 3.10 virtual environment by running:

python3.10 -m venv venv

Activate the virtual environment:

On Unix or macOS:
```
source venv/bin/activate
```
On Windows:
```
venv\Scripts\activate
```

2. Install Required Packages

Install the necessary dependencies using:

python3.10 -m pip install -r req.txt

3. Download the Models

Download or setup the models by running the following scripts. These scripts should download and organize the models into the correct directory (typically in a folder named models):

python3.10 inception_v3.py
python3.10 resnet50.py

4. Run the Triton Inference Server

Ensure Docker is installed then start the Triton server using:

docker run --platform linux/amd64 --rm -p8000:8000 -p8001:8001 -p8002:8002 \
-v "$(pwd)/models":/models nvcr.io/nvidia/tritonserver:23.03-py3 \
tritonserver --model-repository=/models

This command mounts your local models directory to Triton’s /models repository.

5. Run the Inference Demo

With the Triton server running, execute the inference script:

python3.10 script.py

Note: Make sure that the file example.jpg used in script.py exists in your working directory, or update the image_path variable in the script accordingly.

Overview of the Demo

The demo in script.py covers:

Image Preprocessing: Converts images to the input format required by the models.
Inference Communication: Uses Triton’s HTTP client to send inference requests.
Postprocessing: Applies softmax to compute probabilities and extracts the top-5 predictions.
Model Comparison: Demonstrates running inference with two different models (Inception-v3 and ResNet-50) using the same image input.

This example serves as a starting point for working with the Triton Inference Server in production environments, helping you integrate model serving into your own applications.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
models		models
.gitignore		.gitignore
README.MD		README.MD
example.jpg		example.jpg
inception_v3.py		inception_v3.py
requirements.txt		requirements.txt
resnet50.py		resnet50.py
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Triton Inference Demo

Prerequisites

Setup Instructions

1. Create a Virtual Environment

2. Install Required Packages

3. Download the Models

4. Run the Triton Inference Server

5. Run the Inference Demo

Overview of the Demo

About

Releases

Packages

Languages

hashaaamm/triton-demo

Folders and files

Latest commit

History

Repository files navigation

Triton Inference Demo

Prerequisites

Setup Instructions

1. Create a Virtual Environment

2. Install Required Packages

3. Download the Models

4. Run the Triton Inference Server

5. Run the Inference Demo

Overview of the Demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages