You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Combined with the default of include_background=True, this makes the DiceScore quite optimistic (> 80% dice with a pretrained segmentation model after 1 epoch of fine tuning) because a segmentation model will tend to be biased towards predicting background.
To Reproduce
Steps to reproduce the behavior...
Code sample
This demo shows various DiceScore intializations, applied to target and output tensors which are randomly intialized but show a bias towards background (class=0).
importtorchimporttorchmetricsimportpandasaspdnum_classes=20batch_size=16train_metrics=get_metric_collection(num_classes)
dice_score_default=torchmetrics.segmentation.DiceScore(
input_format="index",
num_classes=num_classes,
)
dice_score_no_bg=torchmetrics.segmentation.DiceScore(
input_format="index",
num_classes=num_classes,
include_background=False,
)
dice_score_macro=torchmetrics.segmentation.DiceScore(
input_format="index",
num_classes=num_classes,
average="macro",
)
dice_score_realistic=torchmetrics.segmentation.DiceScore(
input_format="index",
num_classes=num_classes,
include_background=False,
average="macro",
)
# Create example output and target tensors, where the background is 75% of the outputexample_output=torch.randint(0, num_classes, (batch_size, 10, 10))
output_background_mask=torch.rand(example_output.shape) <0.75example_output[output_background_mask] =0# Create example target tensor, where the background is 75% of the targetexample_target=torch.randint(0, num_classes, (batch_size, 10, 10))
target_background_mask=torch.rand(example_target.shape) <0.75example_target[target_background_mask] =0dice_score_default.update(example_output, example_target)
dice_score_no_bg.update(example_output, example_target)
dice_score_macro.update(example_output, example_target)
dice_score_realistic.update(example_output, example_target)
scores= {
"include_background": ["True", "False"],
"average='micro'": [
dice_score_default.compute().item(),
dice_score_no_bg.compute().item(),
],
"average='macro'": [
dice_score_macro.compute().item(),
dice_score_realistic.compute().item(),
],
}
scores_df=pd.DataFrame(scores)
print(scores_df)
# include_background average='micro' average='macro'# 0 True 0.575000 0.042818# 1 False 0.007878 0.005482
Expected behavior
The default initialization of DiceScore should be a sensible choice which gives realistic results. The current set of defaults, when used with an entirely random outputs and targets, but a typical distribution of 75% background in the output and target, the DiceScore is >50%. This is not representative, and the expected DiceScore should be <1%, since these are nearly random guesses.
Environment
Python & PyTorch Version (e.g., 1.0):
Python 3.12.9
PyTorch 2.6.0
torchmetrics 1.7.0
Any other relevant information such as OS (e.g., Linux):
Mac
Additional context
Maybe these defaults were chosen for a particular reason that I'm not familiar with, but it seems to me that the torchmetrics metrics should choose a consistent averaging method, and that for segmentation tasks, we should ignore the background by default.
I understand one reason not to make this change is because updating defaults may lead to unexpected changes for users which have not specific the average strategy.
I would be happy to make this change and add/update any relevant tests, if the community agrees.
The text was updated successfully, but these errors were encountered:
@Isalia20 I am okay with you taking a stab at this but the fix is not just changing the default arguments. Because we promise some backwards compatibility, we need to first raise a deprecation warning (that the default will change for one or more arguments) for 1 release and then we can make the change in the release afterwards.
🐛 Bug
I noticed that the DiceScore metric in segmentation uses an average strategy of
'micro'
torchmetrics/src/torchmetrics/segmentation/dice.py
Line 113 in 2c28e25
which is different from typical multiclass averaging handling, as shown in the MulticlassStatScores class
torchmetrics/src/torchmetrics/classification/stat_scores.py
Line 312 in 2c28e25
Combined with the default of
include_background=True
, this makes the DiceScore quite optimistic (> 80% dice with a pretrained segmentation model after 1 epoch of fine tuning) because a segmentation model will tend to be biased towards predicting background.To Reproduce
Steps to reproduce the behavior...
Code sample
This demo shows various DiceScore intializations, applied to target and output tensors which are randomly intialized but show a bias towards background (class=0).
Expected behavior
The default initialization of DiceScore should be a sensible choice which gives realistic results. The current set of defaults, when used with an entirely random outputs and targets, but a typical distribution of 75% background in the output and target, the DiceScore is >50%. This is not representative, and the expected DiceScore should be <1%, since these are nearly random guesses.
Environment
Additional context
Maybe these defaults were chosen for a particular reason that I'm not familiar with, but it seems to me that the torchmetrics metrics should choose a consistent averaging method, and that for segmentation tasks, we should ignore the background by default.
I understand one reason not to make this change is because updating defaults may lead to unexpected changes for users which have not specific the average strategy.
I would be happy to make this change and add/update any relevant tests, if the community agrees.
The text was updated successfully, but these errors were encountered: