The Alchemist.
HomeAI EngineeringSubstack
SubscribeLogin
Deep Learning
Introduction
Perceptron Ann
Activations
Loss Functions
Optimizers
Regularization
Cnn
Lstm
Advanced
  1. Home
  2. Blog
  3. Deep Learning
  4. Optimizers
Back to Deep Learning

~/blog/tutorials/deep-learning

Optimizers

Tutorial
Jul 1, 20268 min read
0

Gradient Descent

Training a neural network is an optimization problem. You have a cost function J(w, b) — a surface defined over all model parameters. You want to find the lowes…

Tutorial
Jul 1, 20269 min read
0

Stochastic Gradient Descent (SGD)

Batch gradient descent computes the exact gradient — but it requires processing all n training samples before taking a single weight update. With 1 million samp…

Tutorial
Jul 1, 20268 min read
0

Mini-Batch SGD

Batch GD: one update per epoch, exact gradient, slow. SGD: n updates per epoch, noisy gradient, fast but volatile. Mini-batch SGD: k updates per epoch, approxim…

Tutorial
Jul 1, 20268 min read
0

SGD with Momentum

Mini-batch SGD has a problem in narrow loss valleys. The gradient across the narrow dimension is large and oscillates in sign — the optimizer zigzags side-to-si…

Tutorial
Jul 1, 20268 min read
0

Adagrad

All previous optimizers — batch GD, SGD, mini-batch SGD, momentum — use the same learning rate η for every parameter. This is a bad assumption when different pa…

© 2026 Mohammed Vasim. Built with curiosity.