Why LSTM RNN?
A plain recurrent neural network carries information forward one hidden state at a time — each step blends the current input with whatever the previous step rem…
~/blog/tutorials/deep-learning
A plain recurrent neural network carries information forward one hidden state at a time — each step blends the current input with whatever the previous step rem…
The previous post established why a vanilla RNN's single hidden state breaks down over long sequences — gradients shrink multiplicatively at every timestep. LST…
The cell state carries information forward across timesteps, but not everything that was relevant a moment ago stays relevant. A language model tracking "she ha…
The forget gate decides what survives from the cell state's past. It never adds anything new. Once "He" has erased the gender dimension, the cell state needs fr…
The cell state now holds everything the LSTM has decided is worth remembering — a mix of long-term signal built up across forget and input gates. But not all of…