~/blog/tutorials/deep-learning

Regularization

Jul 1, 20267 min read

Exploding Gradient Problem

The vanishing gradient problem (section 3, post 01) described what happens when weights are too small: gradients shrink layer by layer until the first layers re…

Tutorial

Jul 1, 20268 min read

Weight Initialization Techniques

The first decision you make before training starts is how to initialize the weights. It determines whether gradients vanish or explode. It determines whether ne…

Tutorial

Jul 1, 20269 min read

Dropout Layers

A neural network that achieves 99% accuracy on training data and 70% on test data is not a good model — it has memorized the training set. Dropout is the most w…

Tutorial

Jul 3, 20268 min read

Batch Normalization

Training a deep network without normalization means that every weight update in layer 3 shifts the distribution of inputs to layer 4, which shifts what layer 4…

Tutorial

Jul 3, 20267 min read

Layer Normalization

Batch Normalization normalizes across samples in a batch — it averages over the N-dimension. That design breaks the moment you process a single sample: at infer…