~/blog/tutorials/deep-learning

Activations

Jul 1, 202610 min read

Vanishing Gradient Problem

You train a 5-layer sigmoid network on a classification task and watch the loss barely move for the first hundred epochs. The model isn't broken — gradient desc…

Tutorial

Jul 1, 202610 min read

Sigmoid Activation Function

A neuron computes a weighted sum z = w·x + b. That number can be anything: −1000, 0, 47.3. But for a binary classification output — "will this loan default?" —…

Tutorial

Jul 1, 20268 min read

Tanh Activation Function

Sigmoid solves one problem — mapping z to a probability — but introduces another: every output is positive, which forces all upstream weight gradients to update…

Tutorial

Jul 1, 202610 min read

ReLU Activation Function

Sigmoid and tanh both saturate — for large |z|, their derivatives collapse toward zero and gradients die. ReLU sidesteps this entirely for positive values: the…

Tutorial

Jul 1, 20269 min read

Leaky ReLU and Parametric ReLU

ReLU kills neurons. When z ≤ 0 for every input a neuron encounters, its output is zero and its gradient is zero — the weight never moves again. The fix is simpl…