Skip to content

hashaaamm/triton-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Triton Inference Demo

This demo illustrates how to run inference using Triton Inference Server. It performs the following tasks:

  • Preprocessing an input image (loading, resizing, normalizing, and reformatting for model input).
  • Inference by sending the preprocessed image to Triton using two models:
    • Inception-v3 (expects 299x299 images)
    • ResNet-50 (expects 224x224 images)
  • Postprocessing of the model output to compute softmax probabilities and display the top-5 predictions.

This setup demonstrates how Triton can be integrated into a production-style pipeline for computer vision tasks.

Prerequisites

  • Python 3.10
  • Docker (for running Triton Inference Server)
  • The required Python packages are listed in req.txt

Setup Instructions

1. Create a Virtual Environment

Create a Python 3.10 virtual environment by running:

python3.10 -m venv venv

Activate the virtual environment:

  • On Unix or macOS:
    source venv/bin/activate
  • On Windows:
    venv\Scripts\activate

2. Install Required Packages

Install the necessary dependencies using:

python3.10 -m pip install -r req.txt

3. Download the Models

Download or setup the models by running the following scripts. These scripts should download and organize the models into the correct directory (typically in a folder named models):

python3.10 inception_v3.py
python3.10 resnet50.py

4. Run the Triton Inference Server

Ensure Docker is installed then start the Triton server using:

docker run --platform linux/amd64 --rm -p8000:8000 -p8001:8001 -p8002:8002 \
-v "$(pwd)/models":/models nvcr.io/nvidia/tritonserver:23.03-py3 \
tritonserver --model-repository=/models

This command mounts your local models directory to Triton’s /models repository.

5. Run the Inference Demo

With the Triton server running, execute the inference script:

python3.10 script.py

Note: Make sure that the file example.jpg used in script.py exists in your working directory, or update the image_path variable in the script accordingly.

Overview of the Demo

The demo in script.py covers:

  • Image Preprocessing: Converts images to the input format required by the models.
  • Inference Communication: Uses Triton’s HTTP client to send inference requests.
  • Postprocessing: Applies softmax to compute probabilities and extracts the top-5 predictions.
  • Model Comparison: Demonstrates running inference with two different models (Inception-v3 and ResNet-50) using the same image input.

This example serves as a starting point for working with the Triton Inference Server in production environments, helping you integrate model serving into your own applications.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages