[bug] icemix.py port: Optimizer not handling weight decay for cls_token

The `DeepIce` model contains a method called `no_weight_decay()` which is intended to specify that the `cls_token` parameter should not be subject to weight decay during training:
```
@torch.jit.ignore
def no_weight_decay(self) -> Set:
    """cls_tocken should not be subject to weight decay during training."""
    return {"cls_token"}
```
However, `optimizer_grouped_parameters` are not specified during training, so this method has no effect.
I believe that in the original 2nd place code, FastAI's wrapper around AdamW handled this automatically.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug] icemix.py port: Optimizer not handling weight decay for cls_token #713

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] icemix.py port: Optimizer not handling weight decay for cls_token #713

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions