This model tries to generate masked faces of the characters given the previous sequential frames.
This repository is not fully completed!
- Golden Age Comics: Includes US comics between 1938 to 1956. The extracted panel images are used, which are retrieved through the study The Amazing Mysteries of the Gutter.
The whole panel data is processed by a cartoon Face Detector model (which can be found in here) by using mixed_r50
weights and by setting confidence threshold
to 0.55 and nms threshold
to 0.2. The following statistics are retrieved from the outputs of the detector model.
- ** Total files:** 1229664
- Total files with found faces: 684885
- Total faces: 1063804
- Faces above 64px: 309079 / 521089
(min(width, height) >= 64 / max(width, height) >= 64)
- Faces above 128px: 75111 / 158988
(min(width, height) >= 128 / max(width, height) >= 128)
- Faces above 256px: 13214 / 27471
(min(width, height) >= 256 / max(width, height) >= 256)
- Panel Height: mean=510.0328 / median=475 / mode=445
- Panel Width: mean=508.4944 / median=460 / mode=460
- Face detection (Siamese) on iCartoonDataface (~%86 test acc) link
- Google Sheet for recording Experiment Results
- In order to run the module 'golden_age_config.yaml' file should be created under configs.
- Example Config:
# For directly face generation task
faces_path: /userfiles/comics_grp/golden_age/faces_128/
face_train_test_ratio: 0.9
# For panel face reconstruction task
panel_path: /datasets/COMICS/raw_panel_images/
sequence_path: /userfiles/comics_grp/golden_age/panel_face_areas.json
annot_path: /userfiles/comics_grp/golden_age/face_annots/
mask_val: 1
mask_all: False
return_mask: False
return_mask_coordinates: False
train_test_ratio: 0.95
train_mode: True
panel_dim:
- 300
- 300
- To train the PlainSSuperVAE network, you have to specify the following parameters in the
ssupervae_config.yaml
file under the configs folder. To use the LSTM structure, simply set the flaguse_lstm
toTrue
.
# Encoder Parameters
backbone: "efficientnet-b5"
embed_dim: 256
latent_dim: 256
use_lstm: False
# Plain Encoder Parameters
seq_size: 3
# LSTM Encoder Parameters
lstm_hidden: 256
lstm_dropout: 0
lstm_bidirectional: False
fc_hidden_dims: []
fc_dropout: 0
num_lstm_layers: 1
masked_first: True
# Decoder Parameters
decoder_channels:
- 64
- 128
- 256
- 512
image_dim: 64
# Training Parameters
batch_size: 4
train_epochs: 100
lr: 0.0002
weight_decay: 0.000025
beta_1: 0.5
beta_2: 0.999
g_clip: 100
- In order to run the module 'vae_config.yaml' file should be created under configs.
- Example Config:
num_training_samples: 30000
num_test_samples: 10240
test_samples_range:
- 10240
- 10640
image_dim: 64
batch_size: 64
train_epochs: 100
lr: 0.0002
weight_decay: 0.000025
beta_1: 0.5
beta_2: 0.999
latent_dim_z: 256
g_clip: 100
channels:
- 64
- 128
- 256
- 512
- In order to run the module 'intro_vae_config.yaml' file should be created under configs.
- Example Config:
face_image_folder_train_path: /home/gsoykan20/Desktop/ffhq_thumbnails/thumbnails128x128/
face_image_folder_test_path: /home/gsoykan20/Desktop/ffhq_thumbnails/thumbnails128x128/
num_training_samples: 100
test_samples_range:
- 10240
- 10640
image_dim: 64
batch_size: 32
train_epochs: 200
lr: 0.0002
weight_decay: 0.000025
beta_1: 0.5
beta_2: 0.999
latent_dim_z: 256
g_clip: 100
channels:
- 64
- 128
- 256
- 512
# Check paper for the meaning of this params https://arxiv.org/abs/1807.06358
adversarial_alpha: 0.25
ae_beta: 5
adversarial_margin: 110
- In order to run the module 'dcgan_config.yaml' file should be created under configs.
- Example Config:
# For directly face generation task
dataroot : "data/celeba"
# Number of workers for dataloader
workers : 4
# Batch size during training
batch_size : 128
# Spatial size of training images. All images will be resized to this
# size using a transformer.
image_size : 64
# Number of channels in the training images. For color images this is 3
nc : 3
# Size of z latent vector (i.e. size of generator input)
nz : 100
# Size of feature maps in generator
ngf : 64
# Size of feature maps in discriminator
ndf : 64
# Number of training epochs
num_epochs : 150
# Learning rate for optimizers
lr : 0.0002
# Beta1 hyperparam for Adam optimizers
beta1 : 0.5
# Number of GPUs available. Use 0 for CPU mode.
ngpu : 1
# Dataset path
#dataset_path : "/userfiles/ckoksal20/img_align_dataset"
dataset_path : "/kuacc/users/ckoksal20/img_align_dataset"
- In order to run the module 'ssuper_dcgan_config.yaml' file should be created under configs.
- Example Config:
# Encoder Parameters
backbone: "efficientnet-b5"
embed_dim: 256
latent_dim: 100
use_lstm: False
# Plain Encoder Parameters
seq_size: 3
# LSTM Encoder Parameters
lstm_hidden: 256
lstm_dropout: 0
fc_hidden_dims: []
fc_dropout: 0
num_lstm_layers: 1
masked_first: True
# Training Parameters
batch_size: 8
train_epochs: 300
lr: 0.0002
weight_decay: 0.000025
beta_1: 0.5
beta_2: 0.999
g_clip: 100
#
# Spatial size of training images. All images will be resized to this
# size using a transformer.
image_dim : 64
# Number of channels in the training images. For color images this is 3
nc : 3
# Size of z latent vector (i.e. size of generator input)
nz : 100
# Size of feature maps in generator
ngf : 64
# Size of feature maps in discriminator
ndf : 64
# Number of GPUs available. Use 0 for CPU mode.
ngpu : 1
- In order to run the module 'vae_context_attn_config.yaml' file should be created under configs.
- Example Config:
# Encoder Parameters
backbone: "efficientnet-b5"
seq_size: 3
embed_dim: 256
# Decoder Parameters
latent_dim: 256
decoder_channels:
- 64
- 128
- 256
- 512
image_dim: 64
# Training Parameters
batch_size: 1
train_epochs: 100
lr: 0.0001
weight_decay: 0.000025
beta_1: 0.5
beta_2: 0.9
g_clip: 100
# contextual attention related
compute_g_loss: True
coarse_l1_alpha: 1.2
l1_loss_alpha: 1.2
ae_loss_alpha: 1.2
global_wgan_loss_alpha: 1.
gan_loss_alpha: 0.001
wgan_gp_lambda: 10
netG:
input_dim: 3
ngf: 16
netD:
input_dim: 3
ndf: 32
- In order to run the module 'global_local_disc_config.yaml' file should be created under configs.
- Example Config:
global_wgan_loss_alpha: 1.
gan_loss_alpha: 0.001
wgan_gp_lambda: 10
- In order to run the module 'ssuper_msggan_config.yaml' file should be created under configs.
- Example Config:
# Encoder Parameters
backbone: "efficientnet-b5"
embed_dim: 256
latent_dim: 512
use_lstm: False
# Plain Encoder Parameters
seq_size: 3
# LSTM Encoder Parameters
lstm_hidden: 256
lstm_dropout: 0
fc_hidden_dims: []
fc_dropout: 0
num_lstm_layers: 1
masked_first: True
image_dim : 64
# Training Parameters
batch_size: 4
train_epochs: 100
lr: 0.0002
weight_decay: 0.000025
beta_1: 0.5
beta_2: 0.999
g_clip: 100
depth : 5
use_eql : False
use_ema : False
ema_decay : 0.999
g_lr : 0.003
d_lr : 0.001
loss_function : "relativistic-hinge"
One should check and update 'configs/base_config' for global config parameters such base project directory.