DiceScore uses average='micro' by default, while other methods use average='macro' #3031

ZachParent · 2025-03-27T13:44:35Z

🐛 Bug

I noticed that the DiceScore metric in segmentation uses an average strategy of 'micro'

torchmetrics/src/torchmetrics/segmentation/dice.py

Line 113 in 2c28e25

average: Optional[Literal["micro", "macro", "weighted", "none"]] = "micro",

which is different from typical multiclass averaging handling, as shown in the MulticlassStatScores class

torchmetrics/src/torchmetrics/classification/stat_scores.py

Line 312 in 2c28e25

average: Optional[Literal["micro", "macro", "weighted", "none"]] = "macro",

Combined with the default of include_background=True, this makes the DiceScore quite optimistic (> 80% dice with a pretrained segmentation model after 1 epoch of fine tuning) because a segmentation model will tend to be biased towards predicting background.

To Reproduce

Steps to reproduce the behavior...

Code sample

This demo shows various DiceScore intializations, applied to target and output tensors which are randomly intialized but show a bias towards background (class=0).

import torch
import torchmetrics
import pandas as pd

num_classes = 20
batch_size = 16
train_metrics = get_metric_collection(num_classes)
dice_score_default = torchmetrics.segmentation.DiceScore(
    input_format="index",
    num_classes=num_classes,
)
dice_score_no_bg = torchmetrics.segmentation.DiceScore(
    input_format="index",
    num_classes=num_classes,
    include_background=False,
)
dice_score_macro = torchmetrics.segmentation.DiceScore(
    input_format="index",
    num_classes=num_classes,
    average="macro",
)
dice_score_realistic = torchmetrics.segmentation.DiceScore(
    input_format="index",
    num_classes=num_classes,
    include_background=False,
    average="macro",
)

# Create example output and target tensors, where the background is 75% of the output
example_output = torch.randint(0, num_classes, (batch_size, 10, 10))
output_background_mask = torch.rand(example_output.shape) < 0.75
example_output[output_background_mask] = 0

# Create example target tensor, where the background is 75% of the target
example_target = torch.randint(0, num_classes, (batch_size, 10, 10))
target_background_mask = torch.rand(example_target.shape) < 0.75
example_target[target_background_mask] = 0

dice_score_default.update(example_output, example_target)
dice_score_no_bg.update(example_output, example_target)
dice_score_macro.update(example_output, example_target)
dice_score_realistic.update(example_output, example_target)

scores = {
    "include_background": ["True", "False"],
    "average='micro'": [
        dice_score_default.compute().item(),
        dice_score_no_bg.compute().item(),
    ],
    "average='macro'": [
        dice_score_macro.compute().item(),
        dice_score_realistic.compute().item(),
    ],
}
scores_df = pd.DataFrame(scores)
print(scores_df)
#   include_background  average='micro'  average='macro'
# 0               True         0.575000         0.042818
# 1              False         0.007878         0.005482

Expected behavior

The default initialization of DiceScore should be a sensible choice which gives realistic results. The current set of defaults, when used with an entirely random outputs and targets, but a typical distribution of 75% background in the output and target, the DiceScore is >50%. This is not representative, and the expected DiceScore should be <1%, since these are nearly random guesses.

Environment

Python & PyTorch Version (e.g., 1.0):
- Python 3.12.9
- PyTorch 2.6.0
- torchmetrics 1.7.0
Any other relevant information such as OS (e.g., Linux):
- Mac

Additional context

Maybe these defaults were chosen for a particular reason that I'm not familiar with, but it seems to me that the torchmetrics metrics should choose a consistent averaging method, and that for segmentation tasks, we should ignore the background by default.

I understand one reason not to make this change is because updating defaults may lead to unexpected changes for users which have not specific the average strategy.

I would be happy to make this change and add/update any relevant tests, if the community agrees.

The text was updated successfully, but these errors were encountered:

github-actions · 2025-03-27T13:45:08Z

Hi! Thanks for your contribution! Great first issue!

Isalia20 · 2025-04-04T12:41:59Z

Seems like an easy fix, @SkafteNicki @Borda I can take it, if not objections

SkafteNicki · 2025-04-04T12:46:37Z

@Isalia20 I am okay with you taking a stab at this but the fix is not just changing the default arguments. Because we promise some backwards compatibility, we need to first raise a deprecation warning (that the default will change for one or more arguments) for 1 release and then we can make the change in the release afterwards.

Isalia20 · 2025-04-04T12:50:05Z

Sure, I'll add a warning for now, keep this ticket open and for the next version we can change this afterwards

ZachParent added bug / fix Something isn't working help wanted Extra attention is needed labels Mar 27, 2025

SkafteNicki assigned Isalia20 Apr 4, 2025

This was referenced Apr 6, 2025

dice score add warnings #3041

Merged

Dice score average = macro default #3042

Draft

Borda added the v1.7.x label Apr 7, 2025

Borda closed this as completed in #3041 Apr 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DiceScore uses average='micro' by default, while other methods use average='macro' #3031

DiceScore uses average='micro' by default, while other methods use average='macro' #3031

ZachParent commented Mar 27, 2025

github-actions bot commented Mar 27, 2025

Isalia20 commented Apr 4, 2025

SkafteNicki commented Apr 4, 2025

Isalia20 commented Apr 4, 2025

DiceScore uses average='micro' by default, while other methods use average='macro' #3031

DiceScore uses average='micro' by default, while other methods use average='macro' #3031

Comments

ZachParent commented Mar 27, 2025

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

github-actions bot commented Mar 27, 2025

Isalia20 commented Apr 4, 2025

SkafteNicki commented Apr 4, 2025

Isalia20 commented Apr 4, 2025