GitHub - MSA-LMC/MAE-SFER: [TCSS 2024] MAE pre-training models (ViT and ConvNeXt) using AffectNet images for static facial expression recognition (SFER).

MAE-SFER

MAE pre-training models (ViT-base, ViT-small, ViT-tiny) using 270K AffectNet images for static facial expression recognition (SFER).

ViTs pre-trained on AffectNet

MAE ViT-Base pre-training on 270K AffectNet with a single 3090 GPU:

python -m torch.distributed.launch main_pretrain.py \
--model mae_vit_base_patch16 \
--batch_size 32 \
--accum_iter 4 --mask_ratio 0.75 \
--blr 1.5e-4 \
--epochs 300 \
--warmup_epochs 40 --weight_decay 0.05 \
--data_path /data/tao/fer/dataset/AffectNetdataset/Manually_Annotated_Images \
--output_dir /path/to/./out_dir_base

ViT-Small and ViT-Tiny follow the same parameter settings as ViT-Base for MAE pre-training, except for the --model and --output_dir.

--model mae_vit_small_patch16 and --output_dir /path/to/./out_dir_small for ViT-Small

--model mae_vit_tiny_patch16 and --output_dir /path/to/./out_dir_tiny for ViT-Tiny

ConvNeXt V2-Base pre-training on 270K AffectNet with a single 3090 GPU:

python -m torch.distributed.launch main_pretrain_convnextv2.py \
--model convnextv2_base \
--batch_size 64 --update_freq 8 \
--blr 1.5e-4 \
--epochs 400 \
--warmup_epochs 40 \
--data_path /data/tao/fer/dataset/AffectNetdataset/Manually_Annotated_Images \
--output_dir /path/to/./out_dir_base_1

Fine-Tuning

Fine-tuning MAE ViT-Base on RAF-DB with a single GPU:

python -m torch.distributed.launch main_rafdb.py\
--nproc_per_node=1 \
--learning-rate 1e-5 \
--epoch 120 \
--model-name vit_base_fixedpe_patch16_224 \
--resume \
--checkpoint-whole checkpoint/vit-base-checkpoint-300.pth
--mixup

Fine-tuning MAE ViT-Base on 270K AffectNet with a single 3090 GPU:

python -m torch.distributed.launch main_finetune_affectnet.py \
--model mae_vit_base_patch16 \
--batch_size 16 \
--accum_iter 2 \
--blr 5e-4 --layer_decay 0.65 --weight_decay 0.05 \
--drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--epochs 10 \
--finetune '/path/out_dir_base_1/vit_base_checkpoint-299.pth'

Hint: few training epochs is recommaned on AffectNet dataset to avoid overfitting noisy labels.

In addition, data augmentation tricks can significantly improve fine-tuning performance. Such as flip, colorjit, affine transformation, RandomErase, and mixup.

ViT-Small and ViT-Tiny follow the same parameter settings as ViT-Base for MAE fine-tuning, except for the --model and --finetune.

--model mae_vit_small_patch16 and --finetune /path/out_dir_small_1/vit_small_checkpoint-300.pth for ViT-Small

--model mae_vit_tiny_patch16 and --finetune /path/out_dir_tiny_1/vit_tiny_checkpoint-300.pth for ViT-Tiny

ConvNeXt V2-Base fine-tuning on RAF-DB with a single 3090 GPU:

python -m torch.distributed.launch main_finetune.py \
--model convnextv2_base \
--batch_size 32 --update_freq 4 \
--blr 6.25e-4 --epochs 100 --warmup_epochs 20 \
--layer_decay_type 'group' --layer_decay 0.6 --weight_decay 0.05 \
--drop_path 0.1 --reprob 0.25 \
--mixup 0.8 --cutmix 1.0 --smoothing 0.1 \
--model_ema True --model_ema_eval True \
--use_amp True \
--data_path /path/to/Dataset/RAF-DB/basic \
--finetune '/path/out_dir_base/convnextv2_base_checkpoint-320.pth'

Results and Pre-trained Models

270K AffectNet pre-trained weights for 300 epochs

name	resolution	RAF-DB Acc(%)	AffectNet-7 Acc(%)	AffectNet-8 Acc(%)	FERPlus Acc(%)	#params	model
MAE ViT-Base	224x224	91.07	66.09	62.42	90.18	86.5M	model
MAE ViT-Small	224x224	90.03	65.53	62.06	89.35	21.9M	model
MAE ViT-Tiny	224x224	88.72	64.25	61.45	88.67	5.6M	model
ConvNeXt V2-B	224x224	89.52	-	-	-	89M	model

The accuracy of MAE ViT-Base increased to 91.79%, 63.81%, and 90.82% on RAF-DB, AffectNet-8, and FERPlus respectively with data augmentation tricks.

Additional weights for ViT-Small（600 epochs） and ViT-Tiny(600 / 800 / 1000 epochs) trained for more epochs are available.

Citation

If you find this repo helpful, please consider citing:

@article{li2024emotion,
  title={Emotion separation and recognition from a facial expression by generating the poker face with vision transformers},
  author={Li, Jia and Nie, Jiantao and Guo, Dan and Hong, Richang and Wang, Meng},
  journal={IEEE Transactions on Computational Social Systems},
  year={2024},
  publisher={IEEE}
}

@article{chen2024static,
  title={From static to dynamic: Adapting landmark-aware image models for facial expression recognition in videos},
  author={Chen, Yin and Li, Jia and Shan, Shiguang and Wang, Meng and Hong, Richang},
  journal={IEEE Transactions on Affective Computing},
  year={2024},
  publisher={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
data		data
logs		logs
models		models
utils		utils
visuals		visuals
README.md		README.md
cmd_run		cmd_run
main_rafdb.py		main_rafdb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MAE-SFER

ViTs pre-trained on AffectNet

Fine-Tuning

Results and Pre-trained Models

270K AffectNet pre-trained weights for 300 epochs

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

MSA-LMC/MAE-SFER

Folders and files

Latest commit

History

Repository files navigation

MAE-SFER

ViTs pre-trained on AffectNet

Fine-Tuning

Results and Pre-trained Models

270K AffectNet pre-trained weights for 300 epochs

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages