Skip to content

pyronear/pyro-eval

Repository files navigation

pyro-eval

Library to evaluate Pyronear ML models 🔥

Context

This module aims at providing an evaluation pipeline to measure and commpare the performance of pyronear algorithms. It is split in two parts:

  • Dataset management
  • Metrics computation

Installation

Python dependencies

Make sure you have uv installed, then clone this repo and install dependencies:

git clone [email protected]:earthtoolsmaker/pyro-eval.git
uv sync

Note:

This repo the pyro-engine repo as a dependency: Make sure to run uv sync to retrieve changes made on this repo.

Data dependencies

To get the data dependencies one can use DVC - To fully use this repository you would need access to our DVC remote storage which is currently reserved for Pyronear members. On request, you will be provided with AWS credentials to access our remote storage.

Pull all the data files tracked by DVC using this command:

dvc pull

Scaffolding

One can use the default ./data folder to store datasets and models to run evaluation on:

  • Models: One can use the ./data/models/ folder to store models to evaluate.
  • Datasets: One can use the ./data/datasets/ folder to store the datasets.
  • Evaluation Results: By default, the results of the evaluation runs are stored under ./data/evaluation/.

Example of files under ./data/:

$ tree -L 3
.
├── datasets
│   ├── gitkeep
│   ├── wildfire_test
│   │   ├── data.yaml
│   │   ├── images
│   │   ├── labels
│   │   └── wildfire_test.csv
│   └── wildfire_test_temporal
│       ├── data.yaml
│       ├── images
│       ├── labels
│       └── wildfire_test_temporal.csv
├── evaluation
│   ├── gitkeep
│   └── runs
│       └── run-20250522-1457-7552
└── models
    ├── artistic-alpaca_v1.1.0_fe129f2.onnx
    ├── artistic-alpaca_v1.1.0_fe129f2.pt
    └── gitkeep

Usage

run_evaluation.py

This script runs the evaluation of the models on the provided test dataset.

uv run python ./scripts/run_evaluation.py \
  --dir-models ./data/models/ \
  --dir-dataset ./data/datasets/wildfire_test/ \
  --dir-temporal-dataset ./data/datasets/wildfire_test_temporal/ \
  --dir-save ./data/evaluation/runs/ \
  --device cuda \
  --loglevel info

Evaluation Pipeline Design

The evaluation pipeline is composed of two steps: data preparation and metrics computation, respectively managed by the EvaluationDataset and EvaluationPipeline classes.

EvaluationDataset

The EvaluationDataset class helps creating a custom dataset object suited for metric computation.

The object is instanciated from an existing image folder or a hugging face repo. A dataset ID can be passed as input, by default the id will be computed from the current date and a custom hash of the dataset. When instanciating from a local folder, the following rules must be follow to ensure a proper functioning of the class:

  • The root folder must contain one subfolder named images and one named labels
  • The images folder must contain the images files, named with the following convention : *_Y-m-dTH-M-S.jpg, for example: seq_44_sdis-07_brison-200_2024-02-16T16-38-22.jpg
  • labels folder must contain a label .txt file in the YOLOv8 TXT format for each image with the coordinates of the groundtruth bounding box
dataset
├── images
│   ├── image1.jpg
│   └── image2.jpg
│   └── image2.jpg
├── labels
│   ├── image1.txt
│   └── image2.txt
    └── image2.txt
datapath = "path/to/dataset"
dataset_ID = "dataset_v0"
dataset = EvaluationDataset(datapath, dataset_ID=dataset_ID)

dir-dataset is used to evaluate the model, while dir-temporal-dataset is used to evaluate the engine on sequences of images.

EvaluationPipeline

The EvaluationPipeline class helps launching the evaluation on a given dataset. The evaluation is launched as follows:

evaluation = EvaluationPipeline(dataset=dataset)
evaluation.run()
evaluation.save_metrics()

The complete evaluation is composed of two part : ModelEvaluator, which provides metrics on the model performance alone, and EngineEvaluator which provides metrics on the whole detection pipeline in the PyroEngine.

The object can be instanciated with the following parameters as input:

  • self.dataset : EvaluationDataset object
  • self.config : config dictionary as described below
  • self.run_id : ID of the run, will be generated if not specified
  • self.use_existing_predictions : if True, we check for existing model predicitons in the predicition folder, each prediction is saved in a json file named after the model hash. Model hash is also saved in a hashfile next to the weight file.

config is a dictionnary that describes the run configuration, if not in the dictionnary, the parameters will take the default values from the Engine and Classifier classes in pyro-engine.

{
    "model_path" : "path/to/model.pt",
    "model" : {
        "iou" : 0,
        "conf" : 0.15,
        "imgsz" : 1024,
    },
    "engine" : {
        "conf_thresh" : 0.15,
        "max_bbox_size" : 0.4,
        "nb_consecutive_frames" : 8,
    },
    "eval" : ["model", "engine"]
}

With the following keys:

  • nb_consecutive_frames (int): Number of consecutive frames taken into accoun in the Engine
  • conf_thresh (float in [0.,1.]): Confidence threshold used in the Engine, below which detections are filtered out
  • conf (float in [0.,1.]): Confidence threshold used in the Classifier, below which detections are filtered out
  • max_bbox_size (float in [0., 1.]): Bbox size above which detections are filtered out
  • iou (float in [0., 1.]): IoU threshold to compute matches between detected bboxes
  • eval (array of strs): Parts of the evaluation pipeline

Launcher configuration

The evaluation can be launched on several configuration at once. launcher.py is used to configure the runs:

configs = [
        {
            "model_path" : "path/to/model_1.pt",
            "engine": {
                "conf_thresh" : 0.1,
            },
        },
        {
            "model_path" : "path/to/model_2.onnx",
            "engine": {
                "max_bbox_size" : 0.12,
            },
            "eval" : ["engine"],
        },
        {
            "model_path" : "path/to/model_3.pt",
            "model" : {
                "iou" : 0,
            },
            "eval" : ["engine"],
        },
    ]

    for config in configs:
        evaluation = EvaluationPipeline(dataset=dataset, config=config, device="mps")
        evaluation.run()
        evaluation.save_metrics()

Results

Metrics are saved in the results folder, in a subdirectory named as the run_ID. The data is stored in a json file with the following content.

The file contains:

  • model_metrics : result of ModelEvaluator
  • engine_metrics : result of EngineEvaluator
  • config : run configuration
  • dataset : dataset information

Useful definitions

EvaluationDataset()

dataset = EvaluationDataset(datapath):

  • dataset.sequences: list of image Sequence within the dataset.
  • dataset.hash: hash of the dataset
  • dataset.dataframe: pandas DataFrame describing the dataset

Sequence()

Sequence : object that represents a sequence of images.

  • sequence.images: list of CustomImage objects, corresponding to image belonging to a single sequence
  • sequence.id: name of the sequence (name of the first image without extension)
  • sequence.sequence_start: timestamp of the first image of the sequence

CustomImage()

CustomImage: object describing an image

  • image.path: file path
  • image.sequence_id: name of the sequence the image belongs to
  • image.timedelta: time elapsed between the start of the sequence and this image
  • image.boxes: ground truth coordinates
  • image.prediction : placeholder to store a prediction
  • image.timestamp: capture date of the image
  • image.hash: image hash
  • image.label: boolean label, True if wildfire present False otherwise
  • image.name: image name

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •