Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature: adding a parameter to control the number of pseudo-validation cases #2061

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ancestor-mithril
Copy link
Contributor

This feature allows to do fewer iterations of pseudo-validation instead of 50, allowing for faster training or more confident pseudo-validation score, depending on the needs of each experiment. It also works together with #2053 by allowing no additional processes for pseudo-validation if -val_iters is set to 1.

While this value could be adjusted by creating a custom trainer, it does not influence the training results, therefore there's no need to create a custom trainer for it.

@FabianIsensee FabianIsensee self-assigned this Apr 1, 2024
@FabianIsensee
Copy link
Member

Hey, I just did a quick benchmark of the time the validation takes:

2024-04-09 20:11:50.785168: Epoch 0
2024-04-09 20:11:50.785251: Current learning rate: 0.01
train time 8.048835515975952
val time 0.5883698463439941
2024-04-09 20:11:59.424580: train_loss -0.3002
2024-04-09 20:11:59.424691: val_loss -0.7542
2024-04-09 20:11:59.424762: Pseudo dice [0.8257, 0.8178]
2024-04-09 20:11:59.424825: Epoch time: 8.64 s
2024-04-09 20:11:59.424881: Yayy! New best EMA pseudo Dice: 0.8218
2024-04-09 20:12:00.269000:
2024-04-09 20:12:00.269072: Epoch 1
2024-04-09 20:12:00.269136: Current learning rate: 0.00999
train time 6.751651048660278
val time 0.6082313060760498
2024-04-09 20:12:07.630979: train_loss -0.7611
2024-04-09 20:12:07.631112: val_loss -0.7954
2024-04-09 20:12:07.631186: Pseudo dice [0.8494, 0.846]
2024-04-09 20:12:07.631252: Epoch time: 7.36 s
2024-04-09 20:12:07.631310: Yayy! New best EMA pseudo Dice: 0.8244
2024-04-09 20:12:08.717944:
2024-04-09 20:12:08.718058: Epoch 2
2024-04-09 20:12:08.718133: Current learning rate: 0.00998
train time 6.752732515335083
val time 0.5877139568328857
2024-04-09 20:12:16.060827: train_loss -0.7902
2024-04-09 20:12:16.060989: val_loss -0.815
2024-04-09 20:12:16.061144: Pseudo dice [0.8695, 0.855]
2024-04-09 20:12:16.061276: Epoch time: 7.34 s
2024-04-09 20:12:16.061400: Yayy! New best EMA pseudo Dice: 0.8282

Validation in its default setting is less than 10% of the runtime, which seems very reasonable to me. Do you really think that this is an urgently needed feature? I would like to avoid changing the parametrization if the trainers init as much as possible because then everyone would have to go through all their code and modify that in their custom trainers. That would affect hundrets of trainers in our case

@ancestor-mithril
Copy link
Contributor Author

Do you really think that this is an urgently needed feature?

No, I would not say this is urgently needed. It would allow customizing the number of pseudo-validation steps, but it's not something critical. Several trainers (e.g.: StuNet v1) already use 1 validation step for training.

Validation in its default setting is less than 10% of the runtime.

This is not always the case. When using HDD or when having other I/O bottleneck, the validation takes a lot of time.
Here I train using compressed data on HDD (npz), with 50 validation steps. The first value after "Epoch time" is the training time, and the second is the validation time:
image
Here I reduce the number of validation steps to 1:
image

I would like to avoid changing the parametrization if the trainers init as much as possible because then everyone would have to go through all their code and modify that in their custom trainers.

Yes, I agree with this, that's why I added the new parameter at the end, with a default value. By doing this I expect that all trainers inheriting from already existing nnUnet trainers would use the default value and further configurations after the super().__init__() call will not interfere with this new parameter. Would you prefer this parameter to be an environment variable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants