- Training and validation code
- SPANet checkpoints with demo
- [] Visualization of features out of SPAM
torch>=1.7.0; torchvision>=0.8.0; pyyaml; timm (pip install timm==0.6.11
)
Data preparation: ImageNet with the following folder structure, you can extract ImageNet by this script.
│imagenet/
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├──val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
Model | Resolution | Params | MACs | Top1 Acc | Download |
---|---|---|---|---|---|
SPANet-S | 224 | 29M | 4.6G | 83.1 | link |
SPANet-M | 224 | 42M | 6.8G | 83.5 | link |
SPANet-MX | 224 | 55M | 9.0G | 83.8 | link |
SPANet-B | 224 | 76M | 12.0G | 84.0 | link |
SPANet-BX | 224 | 100 M | 15.8G | 84.4 | link |
To evaluate our SPANet models, run:
DataPATH=/path/to/imagenet
MODEL=spanet_medium
ckpt=/path/to/checkpoint
batch_size=128
python validate.py $DataPATH --model $MODEL -b $batch_size --checkpoint $ckpt
You can check an example in val.sh.
We set batch size of 1024 by default and train models with 4 GPUs. For multi-node training, adjust --grad-accum-steps
depending on your conditions.
To train (fine-tuning) the models, run:
bash ./scripts/spanet/train_spanet_small.sh
You can check more details in scripts.
Our implementation is mainly based on metaformer baseline. We would like to thank for sharing your nice work!
@inproceedings{yun2023spanet,
title={SPANet: Frequency-balancing Token Mixer using Spectral Pooling Aggregation Modulation},
author={Yun, Guhnoo and Yoo, Juhan and Kim, Kijung and Lee, Jeongho and Kim, Dong Hwan},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={6113--6124},
year={2023}
}