Skip to content

ShowLo/MobileNetV3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

667c7f6 · Jul 24, 2020

History

29 Commits
Jul 23, 2020
Dec 10, 2019
Dec 12, 2019
Nov 27, 2019
Oct 18, 2019
Oct 24, 2019
Nov 26, 2019
Nov 26, 2019
Jul 24, 2020
Oct 19, 2019
Nov 26, 2019
Jul 23, 2020
Oct 19, 2019
Dec 10, 2019

Repository files navigation

MobileNetV3

An implementation of MobileNetV3 with pyTorch

Theory

 You can find the paper of MobileNetV3 at Searching for MobileNetV3.

Prepare data

  • CIFAR-10
  • CIFAR-100
  • SVHN
  • Tiny-ImageNet
  • ImageNet: Please move validation images to labeled subfolders, you can use the script here.

Train

  • Train from scratch:
CUDA_VISIBLE_DEVICES=3 python train.py --batch-size=128 --mode=small \
--print-freq=100 --dataset=CIFAR100 --ema-decay=0 --label-smoothing=0.1 \
--lr=0.3 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0 \
--warmup-epochs=5 --weight-decay=6e-5 --num-epochs=200 --width-multiplier=1 \
-nbd -zero-gamma -mixup

where the meaning of the parameters are as followed:

batch-size
mode: using MobileNetV3-Small(if set to small) or MobileNetV3-Large(if set to large).
dataset: which dataset to use(CIFAR10, CIFAR100, SVHN, TinyImageNet or ImageNet).
ema-decay: decay of EMA, if set to 0, do not use EMA.
label-smoothing: $epsilon$ using in label smoothing, if set to 0, do not use label smoothing.
lr-decay: learning rate decay schedule, step or cos.
lr-min: min lr in cos lr decay.
warmup-epochs: warmup epochs using in cos lr deacy.
num-epochs: total training epochs.
nbd: no bias decay.
zero-gamma: zero $gamma$ of last BN in each block.
mixup: using Mixup.

Pretrained models

 We have provided the pretrained MobileNetV3-Small model in pretrained.

Experiments

Training setting

on ImageNet

CUDA_VISIBLE_DEVICES=5 python train.py --batch-size=128 --mode=small --print-freq=2000 --dataset=imagenet \
--ema-decay=0.99 --label-smoothing=0.1 --lr=0.1 --save-epoch-freq=50 --lr-decay=cos --lr-min=0 --warmup-epochs=5 \
--weight-decay=1e-5 --num-epochs=250 --num-workers=2 --width-multiplier=1 -dali -nbd -mixup -zero-gamma -save

on CIFAR-10

CUDA_VISIBLE_DEVICES=1 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=CIFAR10\
  --ema-decay=0 --label-smoothing=0 --lr=0.35 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0\
  --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=400 --num-workers=2 --width-multiplier=1

on CIFAR-100

CUDA_VISIBLE_DEVICES=1 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=CIFAR100\
  --ema-decay=0 --label-smoothing=0 --lr=0.35 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0\
  --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=400 --num-workers=2 --width-multiplier=1

 Using more tricks:

CUDA_VISIBLE_DEVICES=1 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=CIFAR100\
  --ema-decay=0.999 --label-smoothing=0.1 --lr=0.35 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0\
  --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=400 --num-workers=2 --width-multiplier=1\
  -zero-gamma -nbd -mixup

on SVHN

CUDA_VISIBLE_DEVICES=3 python train.py --batch-size=128 --mode=small --print-freq=1000 --dataset=SVHN\
  --ema-decay=0 --label-smoothing=0 --lr=0.35 --save-epoch-freq=1000 --lr-decay=cos --lr-min=0\
  --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=20 --num-workers=2 --width-multiplier=1

on Tiny-ImageNet

CUDA_VISIBLE_DEVICES=7 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=tinyimagenet\
  --data-dir=/media/data2/chenjiarong/ImageData/tiny-imagenet --ema-decay=0 --label-smoothing=0 --lr=0.15\
  --save-epoch-freq=1000 --lr-decay=cos --lr-min=0 --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=200\
  --num-workers=2 --width-multiplier=1 -dali

 Using more tricks:

CUDA_VISIBLE_DEVICES=7 python train.py --batch-size=128 --mode=small --print-freq=100 --dataset=tinyimagenet\
  --data-dir=/media/data2/chenjiarong/ImageData/tiny-imagenet --ema-decay=0.999 --label-smoothing=0.1 --lr=0.15\
  --save-epoch-freq=1000 --lr-decay=cos --lr-min=0 --warmup-epochs=5 --weight-decay=6e-5 --num-epochs=200\
  --num-workers=2 --width-multiplier=1 -dali -nbd -mixup

MobileNetV3-Large

on ImageNet

Madds Parameters Top1-acc Top5-acc
Offical 1.0 219 M 5.4 M 75.2% -
Ours 1.0 216.6 M 5.47 M - -

on CIFAR-10

Madds Parameters Top1-acc Top5-acc
Ours 1.0 66.47 M 4.21 M - -

on CIFAR-100

Madds Parameters Top1-acc Top5-acc
Ours 1.0 66.58 M 4.32 M - -

MobileNetV3-Small

on ImageNet

Madds Parameters Top1-acc Top5-acc
Offical 1.0 56.5 M 2.53 M 67.4% -
Ours 1.0 56.51 M 2.53 M 67.52% 87.58%

 The pretrained model with top-1 accuracy 67.52% is provided in the folder pretrained.

on CIFAR-10 (Average accuracy of 5 runs)

Madds Parameters Top1-acc Top5-acc
Ours 1.0 17.51 M 1.52 M 92.97% -

on CIFAR-100 (Average accuracy of 5 runs)

Madds Parameters Top1-acc Top5-acc
Ours 1.0 17.60 M 1.61 M 73.69% 92.31%
More Tricks same same 76.24% 92.58%

on SVHN (Average accuracy of 5 runs)

Madds Parameters Top1-acc Top5-acc
Ours 1.0 17.51 M 1.52 M 97.92% -

on Tiny-ImageNet (Average accuracy of 5 runs)

Madds Parameters Top1-acc Top5-acc
Ours 1.0 51.63 M 1.71 M 59.32% 81.38%
More Tricks same same 62.62% 84.04%

Dependency

 This project uses Python 3.7 and PyTorch 1.1.0. The FLOPs and Parameters and measured using torchsummaryX.