This project is based on Reed Scott's article Generative Adversarial Text-to-Image Synthesis and builds on its implementation by Alaa El-Nouby. The originality of our work consists of two points:
- the implementation of a test function that was not present at the base at Alaa El-Nouby;
- the creation of embeddings based on InferSent which is a different language model from the one used by Reed Scott.
A report for this project is available here and the PDF file that was used for the class presentation on the 03/28/2019 is also available here.
Link towards the pre-trained discriminators and generators for flowers and birds.
To reproduce the results, you can download the datasets Caltech-UCSD Birds 200 and Flowers which contain the images, their captions and corresponding embeddings.
However, we will use the embeddings that we designed with Infersent, and that are available here (files in h5 format):
Link to the language models, necessary to do the tests yourself: birds and flowers.
If you want to generate them yourself, the files used are in the InferSent folder, and you must refer to the Infersent GitHub to make them work.
Once the datasets have been downloaded and embeddings created, just use convert_cub_to_hd5_script.py
and convert_flowers_to_hd5_script.py
(fill in config.yaml
correctly to make the scripts work).
First of all, you must use the file config.yaml
and fill in the corresponding paths for each field. Only the fields model_path
and dataset_path
are required if you already have the pre-trained files, and if you only want to do the test phase, only model_path
is required.
To train the model, you must:
- select the relevant arguments in
runtime.py
;--inference, default=False
for training anddefault=True
for testing;--cls
, select the desired value;--pre_trained_disc
and--pre_trained_gen
withdefault=None
for training and the path corresponding to the models pre-trained for the test;--dataset
with the valuebirds
orflowers
;--num_workers, default=0
change the value if using multiprocessing;--epochs, default=200
is the recommended value;- you do not need to change the other values.
- launch
visdom
and open your browser at the address indicated to see the evolution of the model in real time (generation of images by batch and ploting of the loss functions of the generator and the discriminator); - run
python runtime.py
; - checkpoints will appear in the
checkpoints
folder every 10 epochs.
To test the model, you must:
- select the relevant arguments in
runtime.py
;--inference, default=True
for the test;--pre_trained_disc
and--pre_trained_gen
withdefault=/my/path/disc_190.pth
(for example) for the test;--dataset
with the valuebirds
orflowers
;--save_path
refers to the name of the folder in which the results will be generated- the other values do not matter.
- run
python runtime.py
(no need forvisdom
); - the generated images will appear in the folder indicated by
--save_path
.
For this project we did not use mathematical functions for the evaluation of the model. The use of appropriate evaluation metrics for GANs being an active research subject, we preferred to evaluate our model in a more simplistic way using the understanding of the text that the generator is supposed to respect. On the following pictures of flowers you can see that the vocabulary in question is well understood. White vs Yellow vs Purple, Big vs Large vs Thin, etc. The best results are obtained for flowers that have a less complex learning distribution than birds, for example. The results obtained can be tested on the website provided.
It is also possible to interpolate the embeddings and generate the associated images. Below is an example of the results obtained.
It is possible to reproduce the interpolated results using the website provided.
A website (local mode) allows you to generate images with the pre-trained models provided above from your own descriptions. Below are some examples of possible generations. See the Website
folder for more information about launching it in local mode.
- Antoine ABHAY
- Paola de PERTHUIS
- Pascal GODBILLOT
- Théo LACOUR
Special thanks to Mr. Liming Chen for being our tutor for this project, and to Daniel Muller and Mohsen Ardabilian for their constructive criticism.