This work is carried out as part of my Masters Thesis - "Bias and Fairness in Low Resolution Image Recognition" under the guidance of Dr. Mayank Vatsa and Dr. Richa Singh
Clone the repository
git clone https://github.com/ksasi/fairDL.git
Install using pip
pip install -r requirements.txt
Dataset | Description |
---|---|
FFHQ | This stands for Flickr-Faces-HQ is a dataset of 70,000 human faces of high resolution 1024x1024 and covers considerable diversity and variation. |
CMU Multi-PIE | is a constrained dataset consisting of face images of 337 subjects with variation in pose, illumination and expressions. Of these over 44K images of 336 subjects images are selected corresponding to frontal face images having illumination and expression variations. |
BFW | This is balanced across eight subgroups. This consists of 800 face images of 100 subjects, each with 25 face samples.The BFW dataset is grouped into ethnicities (i.e., Asian (A), Black (B), Indian (I), and White (W)) and genders (i.e., Females (F) and Males (M)) shown in (b) figure 2.2.1. The metadata for this dataset consists of list of pairs for face verification. Hence, this dataset can be used to investigate bias in automatic facial recognition (FR) system for verification protocol. |
GAN Bias Estimator |
Model | Description |
---|---|
StyleGAN2-ADA | The generator of StyleGAN2 with adaptive discriminator augmentation (ADA) trained on FFHQ dataset is used to generate synthetic face images. |
Fairface | This is a pretrained Fairface attribute classifier trained on FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age. |
Bias Estimation in Face Verification System |
Model | Description |
---|---|
VGGFace2 | This is a resnet50 backbone trained with MS-Celeb-1M and the fine-tuned with VG- GFace2 dataset. |
DiscoFaceGAN | This is a pretrained model where faces of non-existent people with variations of pose, expression and illumination can be generated. The model is trained using imitative-contrastive learning to learn dientangled representations. This model trained with FFHQ data set is considered for analysis. |
Clone the repository
git clone https://github.com/ksasi/fairDL.git
Clone StyleGAN2-ADA repository
git clone https://github.com/NVlabs/stylegan2-ada-pytorch.git
Clone FairFace repository
git clone https://github.com/dchen236/FairFace.git
Install using pip
pip install -r requirements.txt
Clone the repository
git clone https://github.com/ksasi/fairDL.git
Install using pip
pip install -r requirements.txt
Obtain 336 subjects of CMU Multi-PIE dataset and split into 70-30 ratio for training and testing.
Place the training split in MultiPie51train folder
Ex:- <root_folder>/fairDL/data/MultiPie51train/id_34
Place the testing split in MultiPie51test folder
Ex:- <root_folder>/fairDL/data/MultiPie51test/id_299
Clone DiscoFaceGAN repository
Execute the following to generate synthetic faces
python generate_images.py --subject=2500 --variation=4
Execute the following to move 2000 subjects to synthface folder and 500 subjects to synthfacetest
cd <root>/fairDL
python process_data.py --indir \“<root>/DiscoFaceGAN/generate_images\” --traindir \“<root>/fairDL/data/synthface\” --testdir \“<root>/fairDL/data/synthfacetest\”
Preprocess the synthetic dataset to extract faces and resize
python preprocess.py --source_path <root>/fairDL/data/synthface --target_path <root>/fairDL/data/synthface_processed
python preprocess.py --source_path <root>/fairDL/data/synthfacetest --target_path <root>/fairDL/data/synthfacetest_processed
Obtain and setup bfw dataset under data directory
Ex:- <root>/fairDL/data/bfw
Generate synthetic faces from StyleGAN by executing stylegan2_generator.py in src folder under fairDL as below :
python stylegan2_generator.py --num=2000 --outdir=../data/stylegan2
python generate_csv.py --imgdir=../data/stylegan2 --outdir=../results
Navigate to FairFace folder and execute scripts as below :
rm -rf detected_faces
python predict.py --csv ../fairDL/results test_imgs.csv
cp test_outputs.csv ../fairDL/results/test_outputs_1.csv
rm -rf test_outputs.csv
Navigate to src folder under fairDL and execute the below to generate plots :
python generate_plots_attrib.py --src=../results/test_outputs_1.csv --outdir=../results
Plots namely plot_race.pdf, plot_race4.pdf, plot_gender.pdf and plot_age.pdf are generated in results folder.
- Finetune
python -u <root>/fairDL/src/fine_tune.py --save_path=<root>/fairDL/checkpoints/VGGFace2_CMU_ --model="VGGFace2" --dataset="CMU" --num_classes=1180 --arch="VGGFace2" --epochs=10 --batch_size=128 --learning_rate=1e-4 --weight_decay=1e-4 --momentum=0.9 >> <root>/fairDL/results/VGGFace2_MultiPie51_out.log
- Predict
python <root>/fairDL/src/predict.py --model="VGGFace2" --state="finetuned" --file="<root>/fairDL/data/bfw/bfw-v0.1.5-datatable.csv" --root_path="<root>/fairDL/data/bfw/Users/jrobby/bfw/bfw-cropped-aligned/" --output_file="<root>/fairDL/results/fine_tuned_cmu_pred.csv" --model_checkpoint="<root>/fairDL/checkpoints/VGGFace2_CMU_model_10_checkpoint.pth.tar"
- Evaluate
python -u <root>/fairDL/src/evaluate.py --state="cmu_finetuned" --predfile="<root>/fairDL/results/fine_tuned_cmu_pred.csv" --outdir="<root>/fairDL/results" >> <root>/fairDL/results/out_eval_finetuned_cmu.log
- Finetune
python -u <root>/fairDL/src/fine_tune.py --save_path=<root>/fairDL/checkpoints/VGGFace2_Synth_ --model="VGGFace2" --dataset="Synth" --num_classes=1180 --arch="VGGFace2" --epochs=10 --batch_size=128 --learning_rate=1e-4 --weight_decay=1e-4 --momentum=0.9 >> <root>/fairDL/results/VGGFace2_Synth_out.log
- Predict
python <root>/fairDL/src/predict.py --model="VGGFace2" --state="finetuned" --file="<root>/fairDL/data/bfw/bfw-v0.1.5-datatable.csv" --root_path="<root>/fairDL/data/bfw/Users/jrobby/bfw/bfw-cropped-aligned/" --output_file="<root>/fairDL/results/fine_tuned_synth_pred.csv" --model_checkpoint="<root>/fairDL/checkpoints/VGGFace2_Synth_model_10_checkpoint.pth.tar"
- Evaluate
python -u <root>/fairDL/src/evaluate.py --state=“synth_finetuned" --predfile="<root>/fairDL/results/fine_tuned_synth_pred.csv" --outdir="<root>/fairDL/results" >> <root>/fairDL/results/out_eval_finetuned_synth.log
DoBfv i.e Std(GAR @ FAR) for Ethnicity, Gender and Attributes with CMU Multi-Pie and Synthetic faces (smaller is better for bias) can be obtained from "Plots_DoB_fv.ipynb" notebook
GANs Biased towards age group “20-29” |
GANs are biased towards “white” faces |
Face Verification models trained or fine-tuned with Synthetic faces exhibit bias for ”race” attribute |
For questions and clarifications, please contact @ksasi or raise an issue on GitHub.
The code is adapted from the following repositories:
- VGGFace2 Dataset for Face Recognition
- PyTorch Metric Learning
- stylegan2-ada-pytorch
- DiscoFaceGAN
- FairFace
https://arxiv.org/abs/2208.13061
If you used this repository in your work, please cite the paper as below:
@article{kotti2022biased,
title={On Biased Behavior of GANs for Face Verification},
author={Kotti, Sasikanth and Vatsa, Mayank and Singh, Richa},
journal={arXiv preprint arXiv:2208.13061},
year={2022}
}