Add a thresholding API. #632

GeorgePearse · 2023-12-01T11:36:58Z

Search before asking

I have searched the Supervision issues and found no similar feature requests.

Description

Create a simple API to find the best thresholds to maximise some metric (f1-score, precision, recall), given an annotated dataset and a model.

At the minute I use the below, because it's the only repo that I've found to calculate what I need, in a reasonable time frame.

https://github.com/yhsmiley/fdet-api

Use case

Anyone wanting to deploy models without manual thresholding (or viewing graphs).

Additional

No response

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

RigvedRocks · 2023-12-19T06:49:51Z

I'd like to help by submitting a PR

SkalskiP · 2023-12-28T12:37:48Z

Hi @GeorgePearse and @RigvedRocks 👋🏻 ! Thanks for your interest in supervision. I am sorry that I have not been responsive for the last few days. Before Christmas, I was busy with duties unrelated to supervision, and I was off for the last few days.

The idea looks interesting. @RigvedRocks could you share some initial ideas regarding implementation?

RigvedRocks · 2023-12-29T02:06:14Z

I was thinking of using the basic techniques used in ML such as using the roc curve or using Youden J's statistic but the above approach outlined by @GeorgePearse works for me. I guess I can collaborate with @GeorgePearse to work on this issue if he insists so.

GeorgePearse · 2024-02-01T09:18:56Z

I'd really like to do what I can to keep this ticking over, @SkalskiP do you also think it's valuable? I'm always surprised by the lack of open-source implementations for this, and assume that every company just has their own fix.

@RigvedRocks we could do something like I try to create a branch with a "workable" solution from fdet-api, but starting from the supervision format, and you take it from there? Let me know if that might interest you?

@josephofiowa also curious to hear your thoughts, I used to do it with some voxel51 code (they have a method from which you can get all of the matching predictions for a given IoU), but it was painfully slow.

I keep assuming a "good" solution must exist, but think that the emphasis on threshold agnostic metrics (map etc.) in academia means that it's not given much attention.

SkalskiP · 2024-02-01T09:43:28Z

Hi @GeorgePearse 👋🏻 I like the idea and I'd love to look at your initial implementation. If possible, I want the solution:

As much as possible, used the Metrics API already found in Supervision
If possible, it did not use external libraries. One of the main principles of Supervision is to limit external dependencies.

Such a solution requires a lot of steps, so I need to understand how we can combine it with what we have and how to design the next elements to be as reusable as possible. We will also need to come up with a better name for this task and a better name for the feature. Haha

GeorgePearse · 2024-02-01T10:07:52Z

Yeah all makes sense, tbh, the reason I want it to be integrated into supervision is to solve those very problems, at the minute I'm dealing with a lot of opaque code, I only trust the outputs from having visually inspected the predictions from lots of model/threshold combos that have used it.

As for API questions.

Just something like

# Ideally target_metric could also be a callback so that you a user could customise exactly what they want 
# to optimize for
per_class_thresholds: dict = optimize_thresholds(
    predictions_dataset,
    annotations_dataset, 
    target_metric='f1_score', 
    per_class=True,
    minimum_iou=0.75,
)

SkalskiP · 2024-02-01T10:29:46Z

And what is stored inside per_class_thresholds? Dict[int, float] - class id to optimal IoU mapping?

What's inside optimize_thresholds? I'd appreciate any pseudocode.

GeorgePearse · 2024-02-01T10:34:30Z

class id to optimal score, the minimum IoU to classify a prediction and annotation as a match is set upfront by the user. Is that not the far more common use case for shipping ML products? The minimum IoU is defined by business/product requirements / can be done easily enough visually on a handful of examples. Maybe I'm biased by having mostly trained models where localisation is of secondary importance to classification, and a much much easier problem.

GeorgePearse · 2024-02-01T10:44:01Z

Complete pseudocode:

metrics = []
for class_name in class_list: 
    for threshold in range(0, 1, 100): 
          current_metric = calculate_metric(grid_of_matched_predictions_and_their_scores, metric='f1_score')
          metrics.append({
              'threshold': threshold,
              'class_name': class_name,
              'metric': metric,
          })

metrics_df = pd.DataFrame(metrics)

# but so that you get a row per class, whatever the .groupby() kind of query 
# would be to achieve that 
best_metrics = metrics_df[metrics_df['metric'] == max(metrics_df['metric'])

But everything probably needs to calculated in numpy to not make it painfully slow.

There's a decent chance that this is where most people get this data from currently https://github.com/rafaelpadilla/Object-Detection-Metrics, but the repo is as you'd expect of something 5/6 years old, and doesn't have the useability/documentation of a modern open-core project.

GeorgePearse · 2024-02-01T10:47:23Z

This is what using the fdet-api looks like for me at the minute

thresholds = []
thresholds_dict = {}
f1_score_dict = {}

for counter, class_name in enumerate(annotation_class_names):
      (
          class_name,
          fscore,
          conf,
          precision,
          recall,
          support,
      ) = cocoEval.getBestFBeta(
          beta=1, iouThr=0.5, classIdx=counter, average="macro"
      )
      class_threshold_dict = {
          "class_name": class_name,
          "fscore": fscore,
          "conf": conf,
          "precision": precision,
          "recall": recall,
          "support": support,
      }
      f1_score_dict[class_name] = fscore
      thresholds.append(class_threshold_dict)
      thresholds_dict[class_name] = conf

thresholds_df = pd.DataFrame(thresholds)
print(thresholds_df)

So I end up with both the threshold to achieve the metric I care about, and the metrics that that threshold achieves

SkalskiP · 2024-02-01T11:53:52Z

Understood. This sounds interesting to me. I'm worried about scope, especially if we want to reimplement all metrics.

We need to divide work into smaller chunks. I am plodding with reviews when I need to go through 2k lines of code. On top of this we could assign different tasks to different external contributors and speed up the work.
We need to develop MVP - the shortest path demonstrating the value of the potential solution. With only 1 metric for example.

RigvedRocks · 2024-02-04T06:34:22Z

@GeorgePearse Fine by me. You can create a new branch called and then I can refine your initial solution.

GeorgePearse · 2024-02-04T16:50:56Z

From a look through the metric functionality already implemented it looks like it wouldn't be too painful to add in. The object that comes out of this looks like it's already done most of the upfront maths needed.

Hard for me to tell just from a look, does the output structure contain the scores?

SkalskiP · 2024-02-05T17:08:22Z

@GeorgePearse, could you be a bit more specific? What do you mean by scores?

GeorgePearse added the enhancement New feature or request label Dec 1, 2023

GeorgePearse changed the title ~~At the minute I use this slightly hacky repo in order to calculate the thresholds I want to deploy.~~ Add a thresholding API. Dec 1, 2023

LinasKo mentioned this issue Jul 16, 2024

Metrics API #1366

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a thresholding API. #632

Add a thresholding API. #632

GeorgePearse commented Dec 1, 2023 •

edited

Loading

RigvedRocks commented Dec 19, 2023

SkalskiP commented Dec 28, 2023

RigvedRocks commented Dec 29, 2023

GeorgePearse commented Feb 1, 2024 •

edited

Loading

SkalskiP commented Feb 1, 2024 •

edited

Loading

GeorgePearse commented Feb 1, 2024 •

edited

Loading

SkalskiP commented Feb 1, 2024

GeorgePearse commented Feb 1, 2024

GeorgePearse commented Feb 1, 2024

GeorgePearse commented Feb 1, 2024 •

edited

Loading

SkalskiP commented Feb 1, 2024

RigvedRocks commented Feb 4, 2024

GeorgePearse commented Feb 4, 2024 •

edited

Loading

SkalskiP commented Feb 5, 2024

Add a thresholding API. #632

Add a thresholding API. #632

Comments

GeorgePearse commented Dec 1, 2023 • edited Loading

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

RigvedRocks commented Dec 19, 2023

SkalskiP commented Dec 28, 2023

RigvedRocks commented Dec 29, 2023

GeorgePearse commented Feb 1, 2024 • edited Loading

SkalskiP commented Feb 1, 2024 • edited Loading

GeorgePearse commented Feb 1, 2024 • edited Loading

SkalskiP commented Feb 1, 2024

GeorgePearse commented Feb 1, 2024

GeorgePearse commented Feb 1, 2024

GeorgePearse commented Feb 1, 2024 • edited Loading

SkalskiP commented Feb 1, 2024

RigvedRocks commented Feb 4, 2024

GeorgePearse commented Feb 4, 2024 • edited Loading

SkalskiP commented Feb 5, 2024

GeorgePearse commented Dec 1, 2023 •

edited

Loading

GeorgePearse commented Feb 1, 2024 •

edited

Loading

SkalskiP commented Feb 1, 2024 •

edited

Loading

GeorgePearse commented Feb 1, 2024 •

edited

Loading

GeorgePearse commented Feb 1, 2024 •

edited

Loading

GeorgePearse commented Feb 4, 2024 •

edited

Loading