Skip to content

Latest commit

 

History

History
73 lines (54 loc) · 3.74 KB

README.md

File metadata and controls

73 lines (54 loc) · 3.74 KB

Swin Transformer

PaddlePaddle reimplementation of microsoft's repository for the Swin-Transformer model that was released with the paper Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.

Swin Transformer (the name Swin stands for Shifted window) capably serves as a general-purpose backbone for computer vision. It is basically a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection.

teaser

Requirements

To enjoy some new features, a higher version of PaddlePaddle is required. For more installation tutorials refer to installation.md

How to Train

export PADDLE_NNODES=1
export PADDLE_MASTER="127.0.0.1:12538"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

python -m paddle.distributed.launch \
    --nnodes=$PADDLE_NNODES \
    --master=$PADDLE_MASTER \
    --devices=$CUDA_VISIBLE_DEVICES \
    plsc-train \
    -c ./configs/swin_base_patch4_window7_224_in1k_1n8c_dp_fp16o2.yaml

How to Evaluation

# [Optional] Download checkpoint
mkdir -p pretrained/
wget -O ./pretrained/swin_base_patch4_window7_224_fp16o2.pdparams https://plsc.bj.bcebos.com/models/swin/v2.5/swin_base_patch4_window7_224_fp16o2.pdparams
export PADDLE_NNODES=1
export PADDLE_MASTER="127.0.0.1:12538"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m paddle.distributed.launch \
  --nnodes=$PADDLE_NNODES \
  --master=$PADDLE_MASTER \
  --devices=$CUDA_VISIBLE_DEVICES \
  plsc-eval \
  -c ./configs/swin_base_patch4_window7_224_in1k_1n8c_dp_fp16o2.yaml \
  -o Global.pretrained_model=pretrained/swin_base_patch4_window7_224_fp16o2 \
  -o Global.finetune=False

Other Configurations

We provide more directly runnable configurations, see Swin Configurations.

Models

Model DType Pretrain Resolution Configs GPUs Img/sec Top1 Acc Official Checkpoint Log
Swin-B FP16 O1 ImageNet2012 224x224 config A100*N1C8 2155 0.83362 0.835 download log
Swin-B FP16 O2 ImageNet2012 224x224 config A100*N1C8 3006 0.83223 0.835 download log

Citations

@inproceedings{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021}
}