~/blog/series

NLP Engineering

A hands-on guide to Natural Language Processing — from tokenisation to Word2Vec, with full worked examples and real classification projects

NLP Engineering Series

Welcome to the NLP Engineering Series! This series teaches Natural Language Processing from first principles, with every concept worked through on a concrete text example before touching code.

Sections

Introduction — Roadmap and practical use cases for NLP
Text Preprocessing — Tokenisation, stemming, lemmatization, stopwords, POS tagging, NER
Text Representation — One-Hot Encoding, Bag of Words, N-Grams, TF-IDF
Word Embeddings — Word2Vec intuition, CBOW, SkipGram, AvgWord2Vec, Gensim in practice
Projects — Spam classification, best practices, and a full sentiment analysis capstone

Prerequisites

Comfortable with Python (lists, dicts, loops, string methods)
Basic ML concepts: features, labels, train/test split, accuracy
No deep learning required until the very end of the series

How to Use This Series

Follow the sections in order — each one builds directly on the last
Every post traces its calculations by hand before showing the code
Run the code examples yourself on the anchor text used in each post

Start your NLP journey!