Target Guided Ordinal Encoding

Jun 1, 2026•3 min read•By Mohammed Vasim

Machine LearningAIData Science

Target Guided Ordinal Encoding for High-Cardinality Features

If you have a column with 50 unique ZIP codes, one-hot encoding creates 50 columns — most of them sparse. Label encoding creates 1 column but implies arbitrary ordering. Neither is ideal.

Target guided ordinal encoding (also called mean encoding) takes a different approach: replace each category with the mean of the target variable for that category. This creates a single numeric column that captures the predictive relationship between the category and what you're trying to predict.

How It Works

The idea is straightforward: for each category, calculate the average target value, then use that average as the encoded value.

python

import pandas as pd

df = pd.DataFrame({
    'city': ['New York', 'London', 'Paris', 'Tokyo',
             'New York', 'Paris'],
    'price': [200, 150, 300, 250, 180, 320]
})

mean_price = df.groupby('city')['price'].mean().to_dict()

This gives you a mapping: London → 150, New York → 190, Paris → 310, Tokyo → 250.

python

df['city_encoded'] = df['city'].map(mean_price)

New York (average 190) and Paris (average 310) get different encoded values because their price distributions differ. The encoding captures the relationship between city and price in a single column.

The Target Leakage Problem

Here's where target encoding gets subtle: if you compute the mean over the entire dataset before splitting into training and test sets, you've leaked information. The mean for a category incorporates target values from the test set, which your model shouldn't see.

This artificially inflates performance metrics. The fix is to compute the encoding within each cross-validation fold.

python

from sklearn.model_selection import KFold

df['city_encoded'] = 0
kf = KFold(n_splits=5, shuffle=True, random_state=42)

for train_idx, val_idx in kf.split(df):
    train_fold = df.iloc[train_idx]
    means = train_fold.groupby('city')['price'].mean()
    df.loc[val_idx, 'city_encoded'] = df.loc[val_idx, 'city'].map(means)

Each fold's encoding uses only data available during training on that fold. This keeps your evaluation honest.

Handling Rare Categories

When a category appears only a few times, its mean is unreliable. A city with one $1,000,000 sale gets an encoded value of 1,000,000 — extreme and noisy.

The standard solution is smoothing: blend the category mean with the global mean, weighted by sample size.

python

def smoothed_mean_encoding(series, target, min_samples=5, global_weight=None):
    global_mean = target.mean()
    grouped = target.groupby(series)
    agg = grouped.agg(['count', 'mean'])
    
    if global_weight is None:
        global_weight = min_samples
    
    smoothed = (agg['count'] * agg['mean'] + global_weight * global_mean) / \
               (agg['count'] + global_weight)
    
    return series.map(smoothed.to_dict())

Categories with few samples get pulled toward the global mean. Categories with many samples keep their computed mean. The global_weight parameter controls how much smoothing to apply.

When Target Encoding Works Best

Target encoding is most useful for high-cardinality features where there's a stable relationship between the category and the target. Common use cases:

Geographic data — ZIP codes, cities, regions often correlate with income or purchasing behavior
Product codes — certain products are associated with higher conversion rates
User IDs — user-level past behavior predicts future actions

It's less suitable when:

Categories have very few samples (even with smoothing)
The category-target relationship is noisy or temporary
You have a small dataset where cross-validation splitting reduces training size significantly

Comparison With Other High-Cardinality Methods

Method	Information Used	Dimensionality	Risk
One-hot	Category identity	Very high	Sparse, overfitting
Label encoding	None (arbitrary)	1 column	Implies false order
Frequency encoding	Count only	1 column	Ignores target relationship
Target encoding	Target mean	1 column	Target leakage without CV
Binary encoding	Category bits	log2(n) columns	Less interpretable

Target encoding gives you the best information-to-dimensionality ratio for high-cardinality features, but only if you handle the leakage correctly. The cross-validation approach above is non-negotiable — skipping it means your validation metrics will be optimistic and your production performance will disappoint.

Target Guided Ordinal Encoding