Update README.md

AnirudhMaiya · Jul 15, 2020 · e8bb8d1 · e8bb8d1
1 parent ca7f1c7
commit e8bb8d1
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/README.md b/README.md
@@ -9,7 +9,7 @@ Rethinking SWATS (Switches from Adam to SGD) Optimizer. Performing the switch Lo
 
 Elementwise scaling of learning rate adopted by Adaptive Optimizers such as Adam, RMSProp etc often generalise poorly due to unstable and non-uniform learning rates at the end of training, although they scale well during the initial part of training. Hence SGD is the go-to for SOTA results since it generalizes better than adaptive methods.
 
-SWATS is a method which switches from Adam to SGD when the difference between the bias corrected projected learning rate and the projected learning rate is less than a threshold ϵ.The projected learning rate is found by projecting the SGD's update onto Adam's update. The switch is global i.e. if one of the layers of the network switches to SGD, all the layers are switched to SGD. 
+SWATS is a method from <a href = 'https://arxiv.org/pdf/1712.07628.pdf'>Keskar and Socher</a> which switches from Adam to SGD when the difference between the bias corrected projected learning rate and the projected learning rate is less than a threshold ϵ.The projected learning rate is found by projecting the SGD's update onto Adam's update. The switch is global i.e. if one of the layers of the network switches to SGD, all the layers are switched to SGD. 
 
 ## Switching Locally than Globally