Zhi Chen, Jiang Duan, Yu Xiong, Cheng Yang and Guoping Qiu
This repository is the official PyTorch implementation of the paper CADEL.
pytorch >= 1.8.0
timm == 0.3.2
- If your PyTorch is 1.8.0+, a fix is needed to work with timm.
You can download the original datasets as follows:
-
ImageNet_LT and Places_LT
Download the ImageNet_2014 and Places_365.
-
iNaturalist 2018
- Download the dataset following here.
Change the data_root
in main_CNN.py and main_ViT.py
accordingly.
After preparation, the file structures are as follows:
/path/to/ImageNet-LT/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class2/
img4.jpeg
train.txt
val.txt
test.txt
num_shots.txt
train.txt, val.txt and test.txt list the file names, and num_shots.txt gives the number of training images in each class. All these data files have been uploaded to this repo.
-
You can see all our settings in ./config/
-
Typically, 2 GPUs and >=24 GB per GPU Memory are available. But when training ViT-B-16 with a training resolution of 384, bigger GPU Memory is required.
For the stage one training, you can train the model with DataParallel or DistributedDataParallel. Specially, for stage one training, the commands are:
# Stage one
python main_CNN.py (or python main_ViT.py if you want to train ViT)
or
torch.distributed.launch --nproc_per_node=n main_CNN.py
where n is the number of gpus in your server. And you should divide the defaulting batch_size in our configs with n.
# Stage Two
python main_CNN_PC.py
Datasets | Many | Medium | Few | All | Model |
---|---|---|---|---|---|
ImageNet-LT | 67.5 | 55.6 | 43.2 | 58.5 | ResNet50 |
ImageNet-LT | 68.8 | 55.8 | 44.0 | 59.2 | ResNeXt50 |
iNat18 | --- | --- | --- | 73.5 | ResNet50 |
Places-LT | --- | --- | --- | 41.4 | ResNet152 |
Dataset | Resolution | Many | Med. | Few | Acc | Pretrain ckpt |
---|---|---|---|---|---|---|
ImageNet-LT | 224*224 | 70.3 | 59.8 | 47.5 | 61.7 | Res_224 |
ImageNet-LT | 384*384 | 73.0 | 62.2 | 50.3 | 64.7 | |
iNat18 | 224*224 | 77.7 | 76.3 | 75.1 | 76.2 | Res_128 |
iNat18 | 384*384 | 75.0 | 81.8 | 85.4 | 82.7 | |
Places-LT | 224*224 | 46.6 | 46.7 | 46.5 | 46.6 | Image-1K-224 |
Places-LT | 384*384 | 47.9 | 50.2 | 38.5 | 47.1 | Image-1K-384 |
If you find our idea or code inspiring, please cite our paper:
@article{CADEL,
title={Long-tailed Classification via CAscaded Deep Ensemble Learning},
author={Zhi Chen, Jiang Duan, Yu Xiong, Cheng Yang and Guoping Qiu},
year={2023},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
This code is partially based on cRT and LiVT, if you use our code, please also cite:
@inproceedings{kang2019decoupling,
title={Decoupling representation and classifier for long-tailed recognition},
author={Kang, Bingyi and Xie, Saining and Rohrbach, Marcus and Yan, Zhicheng
and Gordo, Albert and Feng, Jiashi and Kalantidis, Yannis},
booktitle={Eighth International Conference on Learning Representations (ICLR)},
year={2020}
}
@inproceedings{LiVT,
title={Learning Imbalanced Data with Vision Transformers},
author={Xu, Zhengzhuo and Liu, Ruikang and Yang, Shuo and Chai, Zenghao and Yuan, Chun},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2023}
}