Official PyTorch implementation of IEEE Transaction on Multimedia 2023 paper “DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition” . [paper] [Project Page]
We currenent release the pytorch version code for:
- ImageNet-1K training
- ImageNet-1K pre-trained weights
Baidu Netdisk Link: [ckpt] Extracted code:q4mu
Google drive Link: [ckpt]
Our repository is built base on the DeiT repository, but we add some useful features:
- Calculating accurate FLOPs and parameters with fvcore (see check_model.py).
- Auto-resuming.
- Saving best models and backup models.
- Generating training curve (see generate_tensorboard.py).
-
Install PyTorch 1.7.0+ and torchvision 0.8.1+
conda install -c pytorch pytorch torchvision
-
Install other packages
pip install timm==0.5.4 pip install fvcore
Simply run the training scripts as followed, and take dilateformer_tiny as example:
bash dist_train.sh dilateformer_tiny [other prams]
If the training was interrupted abnormally, you can simply rerun the script for auto-resuming. Sometimes the checkpoint may not be saved properly, you should set the resumed model via --reusme ${work_path}/ckpt/backup.pth
.
You can generate the training curves as followed:
python3 generate_tensoboard.py
Note that you should install tensorboardX
.
You can calculate the FLOPs and parameters via:
python3 check_model.py
This repository is built using the timm library and the DeiT repository.
If you use this code for a paper, please cite:
DilateFormer
@article{jiao2023dilateformer,
title = {DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition},
author = {Jiao, Jiayu and Tang, Yu-Ming and Lin, Kun-Yu and Gao, Yipeng and Ma, Jinhua and Wang, Yaowei and Zheng, Wei-Shi},
journal = {{IEEE} Transaction on Multimedia},
year = {2023}
}