Basic RNN was unable to retain long term memory to make prediction regarding the current picture is that od a wolf or dog. This is where LSTM comes into picture. The LSTM cell allows a recurrent system to learn over many time steps without the fear of losing information due to the vanishing gradient problem. It is fully differentiable, therefore gives us the option of easily using backpropagation when updating the weights. Below is the a sample mathematical model of an LSTM cell -
In an LSTM, we would expect the following behaviour -
LSTM consists of 4 types of gates - 1. Forget Gate 2. Learn Gate 3. Remember Gate 4. Use Gate |
---|
Assume the following -
- LTM = Elephant
- STM = Fish
- Event = Wolf/Dog
Learn gate takes into account short-term memory and event and then ignores a part of it and retains only a part of information.
STM and Event are combined together through activation function (tanh), which we further multiply it by a ignore factor as follows -
Forget gate takes into account the LTM and decides which part of it to keep and which part of LTM is useless and forgets it. LTM gets multiplied by a forget factor inroder to forget useless parts of LTM.
Remember gate takes LTM coming from Forget gate and STM coming from Learn gate and combines them together. Mathematically, remember gate adds LTM and STM.
Use gate takes what is useful from LTM and what's useful from STM and generates a new LTM.