Back to blog
← View series: machine learning

~/blog

Target Guided Ordinal Encoding

Jun 1, 20263 min readBy Mohammed Vasim
Machine LearningAIData Science

Target Guided Ordinal Encoding for High-Cardinality Features

If you have a column with 50 unique ZIP codes, one-hot encoding creates 50 columns — most of them sparse. Label encoding creates 1 column but implies arbitrary ordering. Neither is ideal.

Target guided ordinal encoding (also called mean encoding) takes a different approach: replace each category with the mean of the target variable for that category. This creates a single numeric column that captures the predictive relationship between the category and what you're trying to predict.

How It Works

The idea is straightforward: for each category, calculate the average target value, then use that average as the encoded value.

python
import pandas as pd

df = pd.DataFrame({
    'city': ['New York', 'London', 'Paris', 'Tokyo',
             'New York', 'Paris'],
    'price': [200, 150, 300, 250, 180, 320]
})

mean_price = df.groupby('city')['price'].mean().to_dict()

This gives you a mapping: London → 150, New York → 190, Paris → 310, Tokyo → 250.

python
df['city_encoded'] = df['city'].map(mean_price)

New York (average 190) and Paris (average 310) get different encoded values because their price distributions differ. The encoding captures the relationship between city and price in a single column.

The Target Leakage Problem

Here's where target encoding gets subtle: if you compute the mean over the entire dataset before splitting into training and test sets, you've leaked information. The mean for a category incorporates target values from the test set, which your model shouldn't see.

This artificially inflates performance metrics. The fix is to compute the encoding within each cross-validation fold.

python
from sklearn.model_selection import KFold

df['city_encoded'] = 0
kf = KFold(n_splits=5, shuffle=True, random_state=42)

for train_idx, val_idx in kf.split(df):
    train_fold = df.iloc[train_idx]
    means = train_fold.groupby('city')['price'].mean()
    df.loc[val_idx, 'city_encoded'] = df.loc[val_idx, 'city'].map(means)

Each fold's encoding uses only data available during training on that fold. This keeps your evaluation honest.

Handling Rare Categories

When a category appears only a few times, its mean is unreliable. A city with one $1,000,000 sale gets an encoded value of 1,000,000 — extreme and noisy.

The standard solution is smoothing: blend the category mean with the global mean, weighted by sample size.

python
def smoothed_mean_encoding(series, target, min_samples=5, global_weight=None):
    global_mean = target.mean()
    grouped = target.groupby(series)
    agg = grouped.agg(['count', 'mean'])
    
    if global_weight is None:
        global_weight = min_samples
    
    smoothed = (agg['count'] * agg['mean'] + global_weight * global_mean) / \
               (agg['count'] + global_weight)
    
    return series.map(smoothed.to_dict())

Categories with few samples get pulled toward the global mean. Categories with many samples keep their computed mean. The global_weight parameter controls how much smoothing to apply.

When Target Encoding Works Best

Target encoding is most useful for high-cardinality features where there's a stable relationship between the category and the target. Common use cases:

  • Geographic data — ZIP codes, cities, regions often correlate with income or purchasing behavior
  • Product codes — certain products are associated with higher conversion rates
  • User IDs — user-level past behavior predicts future actions

It's less suitable when:

  • Categories have very few samples (even with smoothing)
  • The category-target relationship is noisy or temporary
  • You have a small dataset where cross-validation splitting reduces training size significantly

Comparison With Other High-Cardinality Methods

MethodInformation UsedDimensionalityRisk
One-hotCategory identityVery highSparse, overfitting
Label encodingNone (arbitrary)1 columnImplies false order
Frequency encodingCount only1 columnIgnores target relationship
Target encodingTarget mean1 columnTarget leakage without CV
Binary encodingCategory bitslog2(n) columnsLess interpretable

Target encoding gives you the best information-to-dimensionality ratio for high-cardinality features, but only if you handle the leakage correctly. The cross-validation approach above is non-negotiable — skipping it means your validation metrics will be optimistic and your production performance will disappoint.

Comments (0)

No comments yet. Be the first to comment!

Leave a comment