You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+30-28Lines changed: 30 additions & 28 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,3 @@
1
-
This reposity mainly talked about classical GANs which focus on stabilizing training process and generating high quality images.
2
-
*********************
3
-
4
1
All have been tested with python2.7+ and tensorflow1.0+ in linux.
5
2
6
3
* Samples: save generated data, each folder contains a figure to show the results.
@@ -9,24 +6,24 @@ All have been tested with python2.7+ and tensorflow1.0+ in linux.
9
6
* nets.py: Generator and Discriminator are saved here.
10
7
11
8
12
-
For research purpose,
9
+
For research purpose,
13
10
**Network architecture**: all GANs used the same network architecture(the Discriminator of EBGAN and BEGAN are the combination of traditional D and G)
14
-
**Learning rate**: all initialized by 1e-4 and decayed by a factor of 2 each 5000 epoches (Maybe it is unfair for some GANs, but the influences are small so I ignored)
11
+
**Learning rate**: all initialized by 1e-4 and decayed by a factor of 2 each 5000 epoches (Maybe it is unfair for some GANs, but the influences are small, so I ignored)
15
12
**Dataset**: celebA cropped with 128 and resized to 64, users should copy all celebA images to `./Datas/celebA` for training
16
13
17
-
-[x] DCGAN
18
-
-[x] EBGAN
19
-
-[x] WGAN
20
-
-[x] BEGAN
14
+
-[x] DCGAN
15
+
-[x] EBGAN
16
+
-[x] WGAN
17
+
-[x] BEGAN
21
18
And for comparsion, I added VAE here.
22
-
-[x] VAE
19
+
-[x] VAE
23
20
24
21
The generated results are shown in the end of this page.
The figure is from [LeCun, Yann, et al. "A tutorial on energy-based learning." ](http://yann.lecun.com/exdb/publis/pdf/lecun-06.pdf)
60
57
61
-
In EBGAN, we want the Discriminator to distinguish the real images and the generated(fake) images. How? A simple idea is to set X as the real image and Y as the reconstructed image, and then minimize the energy of X and Y. So we need a auto-encoder to get Y from X, and a measure to calcuate the energy (here are MSE, so simple). Finally we get the structure of Discriminator as shown below.
58
+
In EBGAN, we want the Discriminator to distinguish the real images and the generated(fake) images. How? A simple idea is to set X as the real image and Y as the reconstructed image, and then minimize the energy of X and Y. So we need a auto-encoder to get Y from X, and a measure to calcuate the energy (here are MSE, so simple).
59
+
Finally we get the structure of Discriminator as shown below.
@@ -88,8 +86,8 @@ Use EM distance or Wasserstein-1 distance, so GAN can solve the two problems abo
88
86
**Mathmatics Analysis**
89
87
Why JS divergence has problems? pleas see [Towards Principled Methods for Training Generative Adversarial Networks](https://arxiv.org/pdf/1701.04862.pdf)
90
88
91
-
Anyway, this highlights the fact that the KL, JS, and TV distances are not sensible
92
-
cost functions when learning distributions supported by low dimensional manifolds.
89
+
Anyway, this highlights the fact that **the KL, JS, and TV distances are not sensible
90
+
cost functions** when learning distributions supported by low dimensional manifolds.
@@ -125,51 +123,55 @@ However, it is difficult to directly calculate the original formula, ||f||_L<=1
125
123
We have already introduced the structure of EBGAN, which is also used in BEGAN.
126
124
Then, instead of calculating the Wasserstein distance of the samples distribution in WGAN, BEGAN calculates the wasserstein distance of loss distribution.
127
125
(The mathematical analysis in BEGAN I think is more clear and intuitive than in WGAN)
128
-
So, simply replace the E of L, we get the loss function:
126
+
So, simply replace the E of L, we get the loss function:
The intuition behind the function is easy to understand:
141
139
(Here I describe my understanding roughly...)
142
140
(1). In the beginning, the G and D are initialized randomly and k_0 = 0, so the L_real is larger than L_fake, leading to a short increase of k.
143
-
(2). After several iterations, the D easily learned how to reconstruct the real data, so gamma*L_real - L_fake is negative, k decreased to 0, now D is only to reconstruct the real data and G is to learn real data distrubition so as to minimize the reconstruction error in D.
144
-
(3). Along with the improvement of the ability of G to generate images like real data, L_fake becomes smaller and k becomes larger, so D focuses more on discriminating the real and fake data, then G trained more following.
145
-
(4). In the end, k becomes a constant, which means gamma*L_real - L_fake=0, so the optimization is done.
141
+
(2). After several iterations, the D easily learned how to reconstruct the real data, so gamma x L_real - L_fake is negative, k decreased to 0, now D is only to reconstruct the real data and G is to learn real data distrubition so as to minimize the reconstruction error in D.
142
+
(3). Along with the improvement of the ability of G to generate images like real data, L_fake becomes smaller and k becomes larger, so D focuses more on discriminating the real and fake data, then G trained more following.
143
+
(4). In the end, k becomes a constant, which means gamma x L_real - L_fake=0, so the optimization is done.
146
144
147
145
148
-
149
-
And the global loss is defined the addition of L_real (how well D learns the distribution of real data) and |gamma*L_real - L_fake| (how closed of the generated data from G and the real data)
146
+
And the global loss is defined the addition of L_real (how well D learns the distribution of real data) and |gamma*L_real - L_fake| (how closed of the generated data from G and the real data)
0 commit comments