~/blog
What a Vision Transformer Sees First: A 16×16 Patch Walkthrough
A neural network for images is usually a CNN, and a CNN is usually a stack of small filters sliding over the picture. That is comfortable — each filter sees a small neighborhood, the layers compose into bigger receptive fields, and the model gets a sense of what is in the image without ever having to look at all of it at once.
A Vision Transformer throws that idea away. The image is cut into a fixed grid of non-overlapping squares, each square is flattened into a vector, and the whole bag of vectors is handed to a regular Transformer — the same architecture used for text. No sliding filter, no growing receptive field, no locality baked into the first layer. Just a sequence of tokens, exactly like a sentence.
That choice has a strange consequence: the very first thing a ViT does is not "look at the image." It is the surgery that turns pixels into tokens. Before any attention, before any MLP, before the [CLS] token, there is a patch embedder whose only job is to chop the picture into 16×16 squares and project each one into the model's hidden dimension. Get that step wrong and nothing downstream works.
So the walkthrough below is built around that step. First, the patch embedder is simulated by hand: a 224×224 image becomes a 14×14 grid of 16×16 patches using torch.Tensor.unfold. Then the real google/vit-base-patch16-224 model is run on the same image to confirm the production patch embedder produces the same shape. The point is to demystify the part most blog posts skip — what the tensor actually looks like after the patch embedder has done its work — and to make the rest of the architecture feel less like magic.
Load the image
PIL is the lightest way to read a JPEG into a Python image object. A PIL.Image stores pixels in H × W × C order and uses 0–255 integers — which is the raw sensor-like format PyTorch expects you to convert away from.
from PIL import Image
image_path = "data/nyc_coffee_cup.jpg"Image.open returns a lazy handle to the JPEG. The pixel data is not decoded until something asks for it, so the call itself is cheap. We bind it to image so the rest of the notebook has a stable name to refer to.
image = Image.open(image_path)We also need a couple of tools for the next stage: torchvision.transforms to turn the PIL image into a normalized tensor, and matplotlib so the patch grid can be inspected visually. %matplotlib inline is a Jupyter-only directive — outside notebooks, plt.show() does the same job.
%matplotlib inline
from torchvision.transforms import transforms
import matplotlib.pyplot as pltConfirm the load worked: the cell displays the PIL image inline. If the path was wrong or the file unreadable, this is where the error would surface.
imageOut[5]:
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=224x224>
The torchvision transform pipeline is a list of operations applied in order. We chain a resize and a tensor conversion; the model itself will add normalization later (its own mean and standard deviation), so we deliberately stop short of normalizing here.
# Define transformation
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor()
])Two transformations are bundled into a single Compose:
Resize((224, 224))— the original image is whatever size the photographer chose; ViT expects exactly 224×224 because the patch grid (16×16 patches) divides cleanly into 14×14.ToTensor()— convertsH × W × Cintegers in[0, 255]toC × H × Wfloats in[0.0, 1.0]. PyTorch always uses channels-first, and most pretrained weights assume the input has been scaled to roughly[-1, 1].
pixel_values = transform(image)The shape of pixel_values is [3, 224, 224]. Three channels, then height, then width. That is the canonical PyTorch image tensor layout — and it is what ViTImageProcessor will produce for us in a few minutes, so we are matching its conventions by hand first.
pixel_valuesOut[8]:
tensor([[[0.0235, 0.0039, 0.0078, ..., 0.5333, 0.4745, 0.4824],
[0.0157, 0.0039, 0.0039, ..., 0.5255, 0.5137, 0.5137],
[0.0039, 0.0000, 0.0039, ..., 0.5608, 0.5569, 0.5412],
...,
[0.0706, 0.0549, 0.0235, ..., 0.5059, 0.5294, 0.5059],
[0.0941, 0.0863, 0.0549, ..., 0.5098, 0.5176, 0.5059],
[0.0000, 0.0196, 0.0353, ..., 0.4980, 0.4902, 0.5176]],
[[0.0314, 0.0235, 0.0235, ..., 0.4784, 0.4353, 0.4471],
[0.0353, 0.0235, 0.0196, ..., 0.4824, 0.4824, 0.4902],
[0.0392, 0.0235, 0.0196, ..., 0.5333, 0.5451, 0.5294],
...,
[0.1569, 0.1529, 0.1373, ..., 0.5059, 0.5255, 0.5020],
[0.2157, 0.2196, 0.1961, ..., 0.5098, 0.5137, 0.5020],
[0.1412, 0.1725, 0.1922, ..., 0.4980, 0.4863, 0.5137]],
[[0.0118, 0.0078, 0.0196, ..., 0.4784, 0.4392, 0.4588],
[0.0196, 0.0078, 0.0235, ..., 0.4745, 0.4745, 0.4902],
[0.0353, 0.0235, 0.0235, ..., 0.5020, 0.5176, 0.5020],
...,
[0.2392, 0.2314, 0.2078, ..., 0.5059, 0.5176, 0.4941],
[0.3255, 0.3255, 0.2902, ..., 0.5098, 0.5059, 0.4941],
[0.2667, 0.2863, 0.2941, ..., 0.4980, 0.4784, 0.5059]]])</pre>
The model, however, expects a batch — a fourth leading dimension that says "this is one of N images, here is image index 0." Even with a single image, that axis has to be there.
# Add an extra dimension
pixel_values = pixel_values.unsqueeze(0)Adding the batch axis takes the shape from [3, 224, 224] to [1, 3, 224, 224]. The leading 1 means "this is one image, the first of one." The model will not accept the unbatched version, even though semantically there is only one image to process.
pixel_valuesOut[10]:
tensor([[[[0.0235, 0.0039, 0.0078, ..., 0.5333, 0.4745, 0.4824],
[0.0157, 0.0039, 0.0039, ..., 0.5255, 0.5137, 0.5137],
[0.0039, 0.0000, 0.0039, ..., 0.5608, 0.5569, 0.5412],
...,
[0.0706, 0.0549, 0.0235, ..., 0.5059, 0.5294, 0.5059],
[0.0941, 0.0863, 0.0549, ..., 0.5098, 0.5176, 0.5059],
[0.0000, 0.0196, 0.0353, ..., 0.4980, 0.4902, 0.5176]],
[[0.0314, 0.0235, 0.0235, ..., 0.4784, 0.4353, 0.4471],
[0.0353, 0.0235, 0.0196, ..., 0.4824, 0.4824, 0.4902],
[0.0392, 0.0235, 0.0196, ..., 0.5333, 0.5451, 0.5294],
...,
[0.1569, 0.1529, 0.1373, ..., 0.5059, 0.5255, 0.5020],
[0.2157, 0.2196, 0.1961, ..., 0.5098, 0.5137, 0.5020],
[0.1412, 0.1725, 0.1922, ..., 0.4980, 0.4863, 0.5137]],
[[0.0118, 0.0078, 0.0196, ..., 0.4784, 0.4392, 0.4588],
[0.0196, 0.0078, 0.0235, ..., 0.4745, 0.4745, 0.4902],
[0.0353, 0.0235, 0.0235, ..., 0.5020, 0.5176, 0.5020],
...,
[0.2392, 0.2314, 0.2078, ..., 0.5059, 0.5176, 0.4941],
[0.3255, 0.3255, 0.2902, ..., 0.5098, 0.5059, 0.4941],
[0.2667, 0.2863, 0.2941, ..., 0.4980, 0.4784, 0.5059]]]])</pre>
Before reaching for unfold, declare the patch size as a named constant. 16 is the patch size for the canonical ViT-Base configuration on 224×224 input — the dimensions divide evenly: 224 / 16 = 14, giving a 14×14 grid of 196 patches.
patch_size = 16Cutting the image into patches by hand
Before reaching for Hugging Face, the ViT paper's first equation is worth simulating from scratch. Each 16×16 patch becomes a token; the model then treats those tokens the way a text Transformer treats word embeddings. The shape [1, 3, 224, 224] is a natural fit for a sliding-window view, and PyTorch has one for exactly this situation: Tensor.unfold.
What is torch.Tensor.unfold
unfold is a tensor method that extracts sliding local blocks from a specified dimension without copying all data into a new tensor. It is useful for creating windowed views of input data, such as image patches or overlapping segments in sequences.
Parameters
dimension: the axis along which to extract windows.size: the length of each slice or window.step: the stride between the starts of consecutive windows.
For example, tensor.unfold(2, 16, 16) means:
- take windows along dimension 2,
- each window has size 16,
- move 16 steps for the next window.
Output shape
If the original size along the selected dimension is L, and you unfold with size=S and step=K, the number of windows is:
n = floor((L - S) / K) + 1
The result tensor has an extra dimension appended at the end for the window contents. So a tensor of shape [B, C, H, W] unfolded on height and width becomes:
[B, C, H_windows, W_windows, S, S]
This means you now have a grid of patches, where each patch is a slice of length S along the unfolded dimension.
How it works
unfold does not reshape the tensor in the usual flattening sense. Instead, it creates a view where one dimension indexes the window positions and another dimension contains the window values. That is why it is efficient for patch extraction and local operations.
Example usage in patch extraction
For image patching, apply unfold twice:
- once on the height dimension,
- once on the width dimension.
This converts a tensor of shape [batch, channels, height, width] into a tensor of patches:
[batch, channels, num_patches_h, num_patches_w, patch_size, patch_size]
Then you can reshape or permute to get a list of flattened patches if needed.
Common use cases
- extracting image patches for Vision Transformers
- creating sliding windows over time series data
- building local receptive fields without explicit loops
- reducing operations to patch-based processing in CNN-like pipelines
# pixel_values[0][0][0]Apply unfold twice — once on the height axis (dim 2) and once on the width axis (dim 3) — and the 224×224 image is sliced into non-overlapping 16×16 windows. The result is a 6D view: [batch, channels, patch_rows, patch_cols, patch_h, patch_w].
# Assuming pixel values is your input image tensor of shape
# [batch_size, num_channels, height, width
patches = pixel_values.unfold(
2, patch_size, patch_size
).unfold(
3, patch_size, patch_size
)The 16 floats you see are the top row of the top-left patch of the red channel — pixel intensities normalized into [0, 1] by ToTensor(). To see all five layers of the index, look at the table in the markdown cell that follows.
patches[0][0][0]Out[15]:
tensor([[[0.0235, 0.0039, 0.0078, ..., 0.2471, 0.3608, 0.3765],
[0.0157, 0.0039, 0.0039, ..., 0.2392, 0.3373, 0.3765],
[0.0039, 0.0000, 0.0039, ..., 0.2353, 0.3176, 0.3412],
...,
[0.0431, 0.0157, 0.0039, ..., 0.3020, 0.5569, 0.8353],
[0.0353, 0.0157, 0.0039, ..., 0.2941, 0.5608, 0.8588],
[0.0314, 0.0078, 0.0039, ..., 0.3020, 0.5647, 0.8588]],
[[0.3412, 0.3176, 0.3137, ..., 0.0627, 0.0941, 0.1020],
[0.3255, 0.3216, 0.3490, ..., 0.0902, 0.1098, 0.1137],
[0.2980, 0.2902, 0.3451, ..., 0.1216, 0.1294, 0.1333],
...,
[0.9255, 0.9647, 0.9647, ..., 0.2745, 0.2745, 0.2667],
[0.9451, 0.9804, 0.9804, ..., 0.2980, 0.2745, 0.2784],
[0.9804, 1.0000, 1.0000, ..., 0.3176, 0.2745, 0.2902]],
[[0.1020, 0.0667, 0.0275, ..., 0.1843, 0.1569, 0.0706],
[0.1137, 0.0745, 0.0314, ..., 0.1647, 0.1412, 0.0706],
[0.1333, 0.0902, 0.0392, ..., 0.1333, 0.1216, 0.0667],
...,
[0.2667, 0.2078, 0.1216, ..., 0.2824, 0.2314, 0.1373],
[0.2863, 0.2275, 0.1333, ..., 0.2863, 0.2471, 0.1529],
[0.2941, 0.2392, 0.1451, ..., 0.2745, 0.2392, 0.1451]],
...,
[[0.3608, 0.3529, 0.3961, ..., 0.5373, 0.5569, 0.5647],
[0.3333, 0.4000, 0.4353, ..., 0.4745, 0.5137, 0.5529],
[0.4196, 0.4588, 0.4745, ..., 0.4275, 0.4902, 0.5569],
...,
[0.2471, 0.2431, 0.2314, ..., 0.4980, 0.4471, 0.3137],
[0.3059, 0.2706, 0.2431, ..., 0.3961, 0.3686, 0.2745],
[0.4078, 0.3059, 0.2392, ..., 0.3176, 0.2784, 0.1804]],
[[0.5922, 0.6157, 0.5686, ..., 0.4118, 0.4392, 0.4824],
[0.6000, 0.6275, 0.5804, ..., 0.3843, 0.4039, 0.4824],
[0.6118, 0.6392, 0.5725, ..., 0.4000, 0.3961, 0.4353],
...,
[0.1569, 0.0471, 0.0706, ..., 0.5294, 0.5255, 0.5608],
[0.1020, 0.0431, 0.0902, ..., 0.5137, 0.5216, 0.6078],
[0.0745, 0.0157, 0.0588, ..., 0.4980, 0.5176, 0.6314]],
[[0.5490, 0.5804, 0.6196, ..., 0.5333, 0.4745, 0.4824],
[0.5490, 0.5725, 0.5961, ..., 0.5255, 0.5137, 0.5137],
[0.4902, 0.5216, 0.5216, ..., 0.5608, 0.5569, 0.5412],
...,
[0.6235, 0.6549, 0.7255, ..., 0.2667, 0.2863, 0.2941],
[0.6588, 0.7137, 0.7451, ..., 0.4431, 0.4784, 0.4980],
[0.7020, 0.7137, 0.7020, ..., 0.5804, 0.6196, 0.6353]]])</pre>
Deciphering patches[0][0][0][0][0]
The patches tensor has shape [1, 3, 14, 14, 16, 16], so each index
drills one level deeper into the data:
| Index | Axis | Meaning | Shape after |
|---|---|---|---|
[0] | batch | The first (and only) image in the batch | [3, 14, 14, 16, 16] |
[0] | channel | The Red channel (0=R, 1=G, 2=B) | [14, 14, 16, 16] |
[0] | patch row | The first row of the 14×14 patch grid (topmost patches) | [14, 16, 16] |
[0] | patch column | The first patch in that row → the top-left 16×16 patch | [16, 16] |
[0] | pixel row | The first row of pixels inside that 16×16 patch | [16] |
So the output is 16 pixel values — the top row of the top-left 16×16 patch
from the Red channel. Each value is a normalized pixel intensity (0–1 range,
from ToTensor()).
# Reshaping the patches tensor for easy viewing
patches_reshaped = patches.permute(
0, 2, 3, 1, 4, 5
).contiguous().view(-1, 3, patch_size, patch_size)The goal of that line is to convert the 6D patches tensor into a flat list of standalone image patches. Here's the step-by-step breakdown:
Starting shape: [1, 3, 14, 14, 16, 16]
| Dim | Meaning |
|---|---|
| 1 | batch |
| 3 | channels (RGB) |
| 14 | patch rows |
| 14 | patch columns |
| 16 | patch height (px) |
| 16 | patch width (px) |
permute(0, 2, 3, 1, 4, 5) — reorders to [1, 14, 14, 3, 16, 16]
This moves the channel dimension (1) next to the pixel dimensions, so each patch's RGB data is grouped together:
Before: [B, C, grid_h, grid_w, px_h, px_w]
After: [B, grid_h, grid_w, C, px_h, px_w]
.contiguous() — makes the tensor contiguous in memory so .view() can work.
.view(-1, 3, 16, 16) — flattens the batch and grid dimensions into one:
[1, 14, 14, 3, 16, 16] → [196, 3, 16, 16]
The -1 computes 1 × 14 × 14 = 196 (the total number of patches).
Final result: a tensor of 196 individual RGB patches, each 3×16×16:
patches_reshaped[0] → top-left patch
patches_reshaped[1] → next patch to the right
...
patches_reshaped[195] → bottom-right patch
This format is perfect for:
- Visualizing patches individually with
imshow - Iterating over patches in a loop
- Feeding each patch through a linear projection (the next step in a ViT)
patches_reshapedOut[17]:
tensor([[[[0.0235, 0.0039, 0.0078, ..., 0.2471, 0.3608, 0.3765],
[0.0157, 0.0039, 0.0039, ..., 0.2392, 0.3373, 0.3765],
[0.0039, 0.0000, 0.0039, ..., 0.2353, 0.3176, 0.3412],
...,
[0.0431, 0.0157, 0.0039, ..., 0.3020, 0.5569, 0.8353],
[0.0353, 0.0157, 0.0039, ..., 0.2941, 0.5608, 0.8588],
[0.0314, 0.0078, 0.0039, ..., 0.3020, 0.5647, 0.8588]],
[[0.0314, 0.0235, 0.0235, ..., 0.2471, 0.3333, 0.3529],
[0.0353, 0.0235, 0.0196, ..., 0.2392, 0.3255, 0.3608],
[0.0392, 0.0235, 0.0196, ..., 0.2392, 0.3294, 0.3490],
...,
[0.0353, 0.0157, 0.0157, ..., 0.1882, 0.2235, 0.3765],
[0.0431, 0.0235, 0.0196, ..., 0.1765, 0.2275, 0.4000],
[0.0510, 0.0235, 0.0196, ..., 0.1843, 0.2431, 0.4078]],
[[0.0118, 0.0078, 0.0196, ..., 0.2471, 0.3020, 0.2980],
[0.0196, 0.0078, 0.0235, ..., 0.2392, 0.2980, 0.3137],
[0.0353, 0.0235, 0.0235, ..., 0.2471, 0.3020, 0.3059],
...,
[0.0471, 0.0235, 0.0353, ..., 0.1647, 0.1961, 0.3098],
[0.0392, 0.0196, 0.0157, ..., 0.1686, 0.2000, 0.3294],
[0.0353, 0.0196, 0.0157, ..., 0.1843, 0.2000, 0.3255]]],
[[[0.3412, 0.3176, 0.3137, ..., 0.0627, 0.0941, 0.1020],
[0.3255, 0.3216, 0.3490, ..., 0.0902, 0.1098, 0.1137],
[0.2980, 0.2902, 0.3451, ..., 0.1216, 0.1294, 0.1333],
...,
[0.9255, 0.9647, 0.9647, ..., 0.2745, 0.2745, 0.2667],
[0.9451, 0.9804, 0.9804, ..., 0.2980, 0.2745, 0.2784],
[0.9804, 1.0000, 1.0000, ..., 0.3176, 0.2745, 0.2902]],
[[0.3176, 0.3137, 0.3333, ..., 0.0745, 0.0706, 0.0667],
[0.3137, 0.3176, 0.3608, ..., 0.0824, 0.0863, 0.0784],
[0.2980, 0.2941, 0.3490, ..., 0.0980, 0.0980, 0.1020],
...,
[0.4275, 0.4471, 0.4392, ..., 0.1725, 0.1686, 0.1647],
[0.4471, 0.4588, 0.4549, ..., 0.1804, 0.1765, 0.2000],
[0.4824, 0.5020, 0.4902, ..., 0.1961, 0.1922, 0.2157]],
[[0.2549, 0.2353, 0.2471, ..., 0.0078, 0.0235, 0.0314],
[0.2549, 0.2471, 0.2784, ..., 0.0235, 0.0392, 0.0431],
[0.2510, 0.2314, 0.2784, ..., 0.0353, 0.0471, 0.0588],
...,
[0.2980, 0.2941, 0.3020, ..., 0.0667, 0.0627, 0.0667],
[0.3098, 0.2902, 0.3059, ..., 0.0863, 0.0549, 0.0706],
[0.3451, 0.3294, 0.3294, ..., 0.1137, 0.0706, 0.0902]]],
[[[0.1020, 0.0667, 0.0275, ..., 0.1843, 0.1569, 0.0706],
[0.1137, 0.0745, 0.0314, ..., 0.1647, 0.1412, 0.0706],
[0.1333, 0.0902, 0.0392, ..., 0.1333, 0.1216, 0.0667],
...,
[0.2667, 0.2078, 0.1216, ..., 0.2824, 0.2314, 0.1373],
[0.2863, 0.2275, 0.1333, ..., 0.2863, 0.2471, 0.1529],
[0.2941, 0.2392, 0.1451, ..., 0.2745, 0.2392, 0.1451]],
[[0.0667, 0.0471, 0.0235, ..., 0.1804, 0.1412, 0.0667],
[0.0784, 0.0549, 0.0275, ..., 0.1608, 0.1255, 0.0667],
[0.0980, 0.0706, 0.0353, ..., 0.1294, 0.1059, 0.0627],
...,
[0.1725, 0.1373, 0.0824, ..., 0.3333, 0.2863, 0.1843],
[0.1922, 0.1569, 0.0980, ..., 0.3373, 0.3020, 0.2078],
[0.2157, 0.1725, 0.1098, ..., 0.3333, 0.2941, 0.2000]],
[[0.0392, 0.0235, 0.0078, ..., 0.1608, 0.1294, 0.0588],
[0.0510, 0.0314, 0.0118, ..., 0.1412, 0.1137, 0.0588],
[0.0627, 0.0471, 0.0196, ..., 0.1098, 0.0941, 0.0549],
...,
[0.0784, 0.0588, 0.0353, ..., 0.2549, 0.2196, 0.1294],
[0.0902, 0.0706, 0.0392, ..., 0.2588, 0.2353, 0.1490],
[0.1176, 0.0941, 0.0510, ..., 0.2588, 0.2275, 0.1490]]],
...,
[[[0.1020, 0.0863, 0.1059, ..., 0.2078, 0.1686, 0.1843],
[0.0667, 0.0745, 0.0980, ..., 0.2275, 0.2157, 0.2000],
[0.0863, 0.1020, 0.1098, ..., 0.1843, 0.2118, 0.2078],
...,
[0.4745, 0.5686, 0.5725, ..., 0.5765, 0.5373, 0.5451],
[0.4980, 0.5882, 0.5882, ..., 0.5686, 0.5529, 0.5490],
[0.5216, 0.6196, 0.6235, ..., 0.5961, 0.5765, 0.5647]],
[[0.1098, 0.0941, 0.1137, ..., 0.1843, 0.1412, 0.1569],
[0.0863, 0.0941, 0.1176, ..., 0.2000, 0.1804, 0.1725],
[0.0863, 0.1059, 0.1137, ..., 0.1608, 0.1804, 0.1843],
...,
[0.4902, 0.5843, 0.5882, ..., 0.5804, 0.5412, 0.5490],
[0.5020, 0.5922, 0.5922, ..., 0.5725, 0.5569, 0.5529],
[0.5255, 0.6235, 0.6275, ..., 0.6000, 0.5804, 0.5686]],
[[0.0902, 0.0745, 0.0941, ..., 0.1216, 0.0784, 0.0941],
[0.0706, 0.0784, 0.1020, ..., 0.1373, 0.1137, 0.1098],
[0.0784, 0.0863, 0.0941, ..., 0.1137, 0.1294, 0.1294],
...,
[0.5020, 0.5961, 0.6000, ..., 0.5882, 0.5569, 0.5569],
[0.5176, 0.6118, 0.6118, ..., 0.5882, 0.5765, 0.5686],
[0.5451, 0.6431, 0.6471, ..., 0.6196, 0.6000, 0.5843]]],
[[[0.1608, 0.1608, 0.1451, ..., 0.2078, 0.1765, 0.1529],
[0.1529, 0.1569, 0.1569, ..., 0.2314, 0.2000, 0.1647],
[0.1765, 0.1725, 0.1529, ..., 0.2157, 0.2078, 0.2000],
...,
[0.5412, 0.4784, 0.4980, ..., 0.5176, 0.5490, 0.5373],
[0.5569, 0.5216, 0.5412, ..., 0.4745, 0.5059, 0.5333],
[0.5608, 0.5608, 0.5647, ..., 0.4902, 0.4941, 0.5412]],
[[0.1373, 0.1373, 0.1294, ..., 0.1843, 0.1451, 0.1294],
[0.1294, 0.1412, 0.1412, ..., 0.1922, 0.1647, 0.1333],
[0.1608, 0.1647, 0.1451, ..., 0.1843, 0.1765, 0.1686],
...,
[0.5412, 0.4784, 0.4980, ..., 0.5176, 0.5490, 0.5373],
[0.5569, 0.5216, 0.5412, ..., 0.4745, 0.5059, 0.5333],
[0.5608, 0.5608, 0.5647, ..., 0.4902, 0.4941, 0.5412]],
[[0.0902, 0.0902, 0.0863, ..., 0.1373, 0.1020, 0.0745],
[0.0745, 0.0941, 0.0980, ..., 0.1451, 0.1059, 0.0824],
[0.1137, 0.1137, 0.0980, ..., 0.1333, 0.1255, 0.1176],
...,
[0.5490, 0.4863, 0.5059, ..., 0.5255, 0.5569, 0.5373],
[0.5647, 0.5216, 0.5490, ..., 0.4745, 0.5059, 0.5333],
[0.5686, 0.5608, 0.5725, ..., 0.4902, 0.4941, 0.5333]]],
[[[0.1451, 0.1608, 0.1451, ..., 0.2118, 0.2000, 0.1804],
[0.1608, 0.1569, 0.1490, ..., 0.2314, 0.2235, 0.2471],
[0.1686, 0.1451, 0.1333, ..., 0.2549, 0.2941, 0.3412],
...,
[0.5333, 0.5098, 0.4941, ..., 0.5059, 0.5294, 0.5059],
[0.5647, 0.5647, 0.5725, ..., 0.5098, 0.5176, 0.5059],
[0.5686, 0.5843, 0.6078, ..., 0.4980, 0.4902, 0.5176]],
[[0.1294, 0.1451, 0.1255, ..., 0.1608, 0.1529, 0.1451],
[0.1373, 0.1412, 0.1333, ..., 0.1804, 0.1765, 0.2118],
[0.1451, 0.1294, 0.1176, ..., 0.2196, 0.2627, 0.3098],
...,
[0.5412, 0.5176, 0.4941, ..., 0.5059, 0.5255, 0.5020],
[0.5647, 0.5686, 0.5686, ..., 0.5098, 0.5137, 0.5020],
[0.5686, 0.5882, 0.6039, ..., 0.4980, 0.4863, 0.5137]],
[[0.0824, 0.1020, 0.1020, ..., 0.0980, 0.0980, 0.0863],
[0.0902, 0.0980, 0.0980, ..., 0.1137, 0.1137, 0.1529],
[0.0980, 0.0863, 0.0745, ..., 0.1608, 0.2118, 0.2588],
...,
[0.5373, 0.5059, 0.4941, ..., 0.5059, 0.5176, 0.4941],
[0.5569, 0.5490, 0.5608, ..., 0.5098, 0.5059, 0.4941],
[0.5608, 0.5686, 0.5882, ..., 0.4980, 0.4784, 0.5059]]]])</pre>
Great question. The two sets of dimensions operate at different scales:
| Dimensions | What they represent | Values |
|---|---|---|
| Grid (dim 2 & 3) | Where the patch is in the 14×14 layout of patches across the image | 0–13 (patch position) |
| Pixel (dim 4 & 5) | Where the pixel is inside a single 16×16 patch | 0–15 (pixel position) |
Think of it like a map vs. a tile:
patches shape: [1, 3, 14, 14, 16, 16]
│ │ ▲ ▲ │ │
│ │ │ │ │ └── pixel column inside the patch
│ │ │ │ └────── pixel row inside the patch
│ │ │ └────────── patch column in the grid (0–13)
│ │ └────────────── patch row in the grid (0–13)
│ └────────────────── RGB channel
└───────────────────── batchVisual analogy:
224 px
┌──────────────────────────────┐
│ ┌──────────┐ │
│ │ 16 px │ ← [0][0] │ ← grid row 0
│ │ ← [0]→ │ │
│ │ 16 px │ │
│ └──────────┘ │
│ ┌──────────┐ │
│ │ │ │
│ │ [0][1] │ │ ← grid row 0, col 1
│ │ │ │
│ └──────────┘ │
│ │
│ ← grid row 13 → │
└──────────────────────────────┘patches[0, 0, **0, 0**, :, :]— the top-left patch (grid position 0,0), all 16×16 pixelspatches[0, 0, **0, 1**, :, :]— the patch immediately to the right (grid position 0,1)patches[0, 0, **1, 0**, :, :]— the patch one row down (grid position 1,0)patches[0, 0, 0, 0, **0, 0**]— the top-left pixel of the top-left patchpatches[0, 0, 0, 0, **0, 1**]— the next pixel to the right within that same patch
So the permute(0, 2, 3, 1, 4, 5) is saying: "group grid positions together, then channels, then pixel positions" — which lets you flatten the 14×14 grid into a single list of 196 patches, each with its own [3, 16, 16] structure.
to_pil = transforms.ToPILImage()Each row of this tensor is one patch in raster order — top-left first, then left-to-right, top-to-bottom, ending with the bottom-right patch. The permute step is the only one that needs care: the contiguous() call afterwards is required because view cannot operate on a non-contiguous view, and without it you would get a RuntimeError.
patches_reshaped.size()Out[19]:
torch.Size([196, 3, 16, 16])
patches_reshaped.shape is the same answer as patches_reshaped.size() — both return torch.Size([196, 3, 16, 16]). .size() is the method form; .shape is the property form. Either works.
patches_reshaped.shapeOut[20]:
torch.Size([196, 3, 16, 16])
Sanity-check the surgery by plotting the first few patches. If the permute was wrong, colors would look scrambled; if the indexing was off, the patches would not align with the image. The loop below shows the first four patches, just enough to confirm the layout matches what we expect.
for i in range(patches_reshaped.size(0)):
print(f"Displaying patch {i+1}/{patches_reshaped.size(0)}")
patch_size = patches_reshaped[i].shape
plt.title(f"Patch {i+1}, size: {patch_size}")
plt.imshow(to_pil(patches_reshaped[i]))
plt.axis("off")
plt.show()
if i >= 3:
breakDisplaying patch 1/196
Displaying patch 2/196
Displaying patch 3/196
Displaying patch 4/196
Let the real ViT do the same job
Hand-cutting patches is instructive, but the Hugging Face pipeline does it for you with a Conv2d(kernel_size=16, stride=16) and ImageNet normalization baked in. The next cells use google/vit-base-patch16-224 to confirm our hand-rolled pipeline matches the production one, then read off the prediction.
Installing Feature Extractor
!uv add transformers -qHugging Face ships a thin wrapper around the ViT architecture. ViTImageProcessor does the resize/normalize pipeline; ViTForImageClassification is the pretrained classifier head. The import requests line is unused for local images but is the conventional companion for any Hub-loaded model.
from transformers import (
ViTImageProcessor,
ViTForImageClassification
)
import requestsTwo artifacts come from the Hub: a feature processor (handles all the preprocessing the model expects) and the classifier (the pretrained ViT). We use the google/vit-base-patch16-224 checkpoint, which is the canonical ViT-Base with 16×16 patches on 224×224 input — the same patch geometry we simulated by hand in the previous section.
model_name = "google/vit-base-patch16-224"
feature_processor = ViTImageProcessor.from_pretrained(
model_name
)
classifier = ViTForImageClassification.from_pretrained(
model_name
)Loading weights: 100%|██████████| 200/200 [00:00<00:00, 11575.92it/s]
The feature processor handles everything the pretrained model expects: resize to 224×224, rescale pixels to [0, 1], and normalize with the ImageNet mean and standard deviation. return_tensors="pt" returns a PyTorch tensor instead of a NumPy array — the classifier requires PyTorch.
inputs = feature_processor(images=image, return_tensors="pt")
inputsOut[25]:
{'pixel_values': tensor([[[[-0.9529, -0.9922, -0.9843, ..., 0.0667, -0.0510, -0.0353],
[-0.9686, -0.9922, -0.9922, ..., 0.0510, 0.0275, 0.0275],
[-0.9922, -1.0000, -0.9922, ..., 0.1216, 0.1137, 0.0824],
...,
[-0.8588, -0.8902, -0.9529, ..., 0.0118, 0.0588, 0.0118],
[-0.8118, -0.8275, -0.8902, ..., 0.0196, 0.0353, 0.0118],
[-1.0000, -0.9608, -0.9294, ..., -0.0039, -0.0196, 0.0353]],
[[-0.9373, -0.9529, -0.9529, ..., -0.0431, -0.1294, -0.1059],
[-0.9294, -0.9529, -0.9608, ..., -0.0353, -0.0353, -0.0196],
[-0.9216, -0.9529, -0.9608, ..., 0.0667, 0.0902, 0.0588],
...,
[-0.6863, -0.6941, -0.7255, ..., 0.0118, 0.0510, 0.0039],
[-0.5686, -0.5608, -0.6078, ..., 0.0196, 0.0275, 0.0039],
[-0.7176, -0.6549, -0.6157, ..., -0.0039, -0.0275, 0.0275]],
[[-0.9765, -0.9843, -0.9608, ..., -0.0431, -0.1216, -0.0824],
[-0.9608, -0.9843, -0.9529, ..., -0.0510, -0.0510, -0.0196],
[-0.9294, -0.9529, -0.9529, ..., 0.0039, 0.0353, 0.0039],
...,
[-0.5216, -0.5373, -0.5843, ..., 0.0118, 0.0353, -0.0118],
[-0.3490, -0.3490, -0.4196, ..., 0.0196, 0.0118, -0.0118],
[-0.4667, -0.4275, -0.4118, ..., -0.0039, -0.0431, 0.0118]]]])}</pre>
Calling a Hugging Face model with **inputs unpacks the processor's output (a dict containing pixel_values) into keyword arguments. The classifier runs a forward pass and returns a ModelOutput with the logits and a few other diagnostic fields.
outputs = classifier(**inputs)
outputsOut[26]:
ImageClassifierOutput(loss=None, logits=tensor([[-1.1288e+00, -5.5756e-01, -1.5567e+00, -1.3101e+00, -5.8843e-01,
-1.5344e+00, -1.7410e+00, -1.8962e+00, -1.8121e+00, -2.0072e+00,
-1.5723e+00, -1.7178e+00, -1.8666e+00, -1.7056e+00, -1.6707e+00,
-9.8119e-01, -6.7786e-01, -1.7026e+00, -1.6719e+00, -1.2583e+00,
-1.3638e+00, -1.1015e+00, -1.6210e+00, -1.5375e+00, -1.3533e+00,
-1.2028e+00, -8.9297e-02, -3.4615e-01, -6.2297e-01, -7.7859e-01,
-1.7597e+00, -1.0814e+00, -1.7800e+00, -2.0764e+00, -8.3730e-01,
-1.7238e+00, -2.6275e-01, -1.8172e+00, -9.1032e-01, -1.1190e+00,
-1.0625e+00, -9.2219e-01, -1.3375e+00, -1.3699e+00, -2.3913e-01,
-1.5224e+00, -1.1827e+00, -1.1833e+00, -1.0774e+00, -1.7358e+00,
-1.0469e+00, 8.9774e-02, -4.2469e-01, -1.2435e+00, -1.1599e+00,
-1.4301e+00, -1.0391e+00, -1.3216e+00, -1.4284e+00, -1.0975e+00,
-1.3484e+00, -1.0535e+00, -1.6283e+00, -1.3372e+00, -1.6257e+00,
-1.4261e+00, -1.8167e+00, -1.8975e+00, -8.3239e-01, -3.9849e-02,
-1.7841e+00, -7.3182e-01, -7.5883e-01, -5.2622e-01, -1.0596e+00,
-8.3641e-01, -1.6568e+00, -1.5369e+00, -1.3250e+00, -5.1549e-01,
-1.3927e+00, -2.1316e+00, -2.8735e+00, -2.4314e+00, -1.4067e+00,
-2.6360e+00, -2.8996e+00, 9.7903e-01, -9.8650e-01, -9.2010e-01,
-1.6485e+00, -2.1602e+00, -1.8970e+00, -2.3107e+00, -8.5293e-01,
-1.1307e+00, -2.0294e+00, -1.3849e+00, -8.5414e-01, -1.1081e+00,
-7.7380e-01, -9.0096e-01, -5.7366e-01, -1.7009e-01, -8.7846e-01,
-1.0487e+00, -1.5573e-01, -2.0009e-01, -1.0339e+00, -9.1640e-01,
-1.5770e+00, -1.2183e+00, -4.6172e-01, -1.1566e+00, 6.1680e-02,
-8.3394e-01, -1.7371e+00, 9.0266e-02, 6.6352e-01, -9.2439e-01,
-2.0220e+00, -8.9493e-01, -1.5144e-01, -1.8428e+00, -6.9276e-01,
-1.5316e+00, -3.5799e-01, -1.8749e+00, -1.4881e+00, -1.6213e+00,
-1.3232e+00, -1.7723e+00, -7.9047e-01, -1.9146e+00, -1.1945e+00,
-6.3094e-01, -1.7093e+00, -1.6797e-01, -2.1087e+00, -2.3179e+00,
-1.9785e+00, -1.4969e+00, -1.8115e+00, -2.1166e+00, -1.6588e+00,
-1.0613e+00, -5.0492e-01, -2.1767e+00, -1.6234e+00, -1.1036e+00,
-1.4370e+00, 8.1056e-01, -1.0474e+00, -4.1610e-01, -8.2913e-01,
-9.5349e-01, -9.5689e-01, -2.5514e-01, -2.5555e-01, -1.0438e+00,
-8.7707e-01, -1.4423e+00, -1.6613e+00, -1.3774e+00, -4.5691e-01,
-1.8336e+00, -2.2308e+00, -1.8402e+00, -6.0270e-01, -1.7237e+00,
-1.6384e+00, -4.4052e-01, -1.8071e+00, -1.5352e+00, -1.3887e+00,
-2.2053e+00, -1.1253e+00, -1.3287e+00, -3.6203e-01, -6.7451e-01,
-2.0837e+00, -1.0458e+00, -1.5430e+00, -5.7658e-01, -1.1472e+00,
2.0212e-01, -2.6233e-01, -7.8850e-01, -1.4344e+00, -6.7024e-01,
-9.5009e-01, -2.2901e+00, -1.6378e+00, -5.6170e-01, -1.5876e+00,
5.0793e-02, -1.3269e+00, -1.8811e+00, -1.1867e+00, -8.0472e-01,
-1.2382e+00, -5.5780e-02, -1.4564e+00, -1.3538e+00, -1.5029e-01,
-7.2923e-01, -8.1718e-01, -1.2432e+00, -1.9306e+00, -8.0074e-01,
-7.8810e-01, -5.5854e-01, -1.3028e+00, -1.3805e+00, -1.0876e+00,
-1.2772e+00, -1.7482e+00, -1.0466e+00, -1.7972e+00, -1.3454e+00,
-2.7756e-01, -1.0651e+00, -1.3511e+00, -1.0626e+00, -1.3594e+00,
-1.8190e+00, -1.3949e+00, -2.1168e+00, -1.5986e-01, -1.9322e+00,
-1.9654e+00, -2.2672e+00, -1.2830e+00, -2.0853e+00, -2.9499e+00,
-1.3882e+00, -6.3462e-01, -7.2214e-01, -9.9601e-01, -1.3956e+00,
-1.2745e+00, -1.3796e+00, -1.3912e+00, -9.7312e-01, -1.0223e+00,
-5.4940e-01, -8.0601e-01, -1.0212e+00, -8.6111e-01, -1.1427e+00,
-1.3147e+00, -5.0085e-01, -6.9131e-01, -8.1273e-01, 9.6982e-02,
-4.9907e-01, -6.5999e-01, -1.3624e+00, -5.2018e-01, -1.5573e+00,
-1.3135e+00, -2.3948e+00, -1.3010e+00, -5.0329e-01, -9.7687e-01,
6.8204e-02, -8.9449e-02, -9.4618e-01, -8.8762e-01, -1.6033e+00,
-1.8124e+00, -2.1654e+00, -2.1438e+00, -9.1164e-01, -1.8081e+00,
-2.6824e+00, -2.2461e+00, -2.8404e+00, -2.2389e+00, -2.3717e+00,
-2.7880e+00, -1.7697e+00, -1.7985e+00, -1.3684e+00, 1.0784e-01,
3.8752e-01, -2.0528e+00, -1.5288e+00, -1.5552e+00, -1.8982e+00,
-1.4148e+00, -1.9727e+00, -1.4251e+00, -1.4619e+00, -1.1224e+00,
-1.6169e+00, -1.6196e+00, -1.4527e+00, -1.3702e+00, -1.1338e+00,
-9.2544e-01, -1.4009e+00, -6.4955e-01, -1.5494e+00, -1.9337e+00,
-1.2910e+00, 4.8402e-01, -1.0485e+00, -1.8810e+00, -2.0764e+00,
-9.0217e-01, -1.5040e+00, -1.5439e+00, -1.9934e+00, -1.9208e-01,
-2.2644e+00, -2.1138e+00, -1.7672e+00, -1.0737e+00, -1.4061e+00,
-1.5477e+00, -1.6711e+00, -1.0581e+00, -1.5613e+00, -7.4515e-01,
-8.8589e-01, -7.3351e-01, -6.6812e-02, -7.4479e-02, -4.7825e-01,
-2.5402e+00, -2.9542e+00, -2.0981e+00, -1.1647e+00, -1.1550e+00,
-1.2923e+00, -2.2555e+00, -1.3204e+00, -1.9082e+00, -1.1163e+00,
-1.1351e+00, -2.0877e+00, -1.9385e+00, -2.0916e+00, -1.7080e+00,
-1.8356e+00, -1.5045e+00, -1.5071e+00, -2.1718e+00, -1.5874e+00,
-1.7591e-01, -1.1006e+00, -1.3475e+00, -2.5205e-01, -1.1786e+00,
-1.8891e+00, -1.0950e+00, -1.5001e+00, -5.7034e-01, -6.3320e-01,
-1.0726e+00, -1.5441e+00, -1.1395e+00, 1.4060e-01, -6.0951e-01,
-9.7084e-01, -1.5965e+00, -1.6291e+00, -2.2506e+00, -1.8121e+00,
-1.2935e+00, -1.9496e+00, -2.2894e+00, -1.7002e+00, -2.0452e+00,
-1.3958e+00, -7.1474e-01, -1.8174e+00, -1.7262e+00, -1.7804e+00,
-7.1823e-01, -1.5532e+00, -1.5278e+00, -1.6071e+00, -1.4682e+00,
-1.8455e+00, -2.0240e+00, -9.4857e-01, -1.2190e+00, -9.1378e-01,
-1.1949e+00, -6.0205e-01, -1.6749e+00, -1.7124e+00, -4.3164e-01,
-9.6829e-01, -1.0952e+00, -7.9104e-01, 2.3104e+00, 1.0714e+00,
1.0450e+00, 1.4213e+00, 1.0278e+00, -6.5385e-01, -3.5740e-01,
-6.7773e-01, 3.5781e-01, -1.0136e+00, -1.7345e+00, 3.0628e+00,
-1.7585e+00, -2.2044e-01, 3.5266e+00, -2.1260e-01, 1.4028e+00,
4.5782e+00, -7.8668e-01, 5.2189e-01, 2.8467e+00, 2.9685e+00,
3.9798e-01, 1.7613e-01, -8.7818e-01, 7.2730e-01, 1.4216e+00,
-1.3559e+00, 9.1536e-01, 1.9479e+00, 4.9609e-02, 7.9663e-01,
-1.0046e+00, 4.4092e-01, 9.0080e-01, 1.2989e+00, -1.0287e+00,
6.5570e-01, -1.0619e+00, -6.2818e-01, 1.9653e+00, 1.4845e+00,
1.1694e+00, 3.6230e+00, 4.2247e-01, 2.3146e-01, -9.8158e-01,
-1.4154e+00, 1.5514e+00, 1.1212e+00, -7.9226e-01, -5.8781e-01,
2.0116e-01, 2.8734e-01, 2.8798e-01, 5.3292e-01, 3.6066e+00,
1.2094e+00, -5.1846e-01, 1.5131e+00, 8.4088e-01, -9.2419e-02,
-5.0970e-01, 1.3946e+00, -1.4535e-02, 3.0267e+00, 1.9495e+00,
7.7145e-01, -2.9291e-01, 9.9016e-02, 4.7988e-01, 1.3111e+00,
3.1897e+00, -5.4317e-01, -1.3991e+00, 2.8718e+00, -1.9368e-01,
4.3121e-01, -5.2684e-02, 6.9063e-01, 1.6148e+00, -4.4517e-01,
2.7687e+00, 2.8848e+00, 2.1599e+00, -5.1245e-01, -1.1066e+00,
2.8624e+00, -3.7812e-01, 6.3851e+00, 2.7377e-01, -8.1250e-01,
3.1924e-01, 1.4829e-01, 3.7285e-01, -1.2799e-01, -1.0288e+00,
7.5215e-01, 4.5020e-01, -2.4231e-01, 2.2576e+00, 3.6101e+00,
-2.3716e+00, 9.9774e-01, 2.5722e+00, 1.9629e+00, 1.0951e+01,
5.7500e+00, -2.2427e-01, 1.7355e+00, 2.3874e+00, 1.1408e+00,
-1.8416e+00, -5.5941e-01, 1.7484e+00, -7.2259e-01, 1.7906e+00,
1.8903e+00, -1.9706e-01, -1.6067e-03, 1.5667e+00, 1.3664e+00,
7.5336e-01, 2.6782e+00, 5.7073e-02, 1.4663e+00, 2.1284e-01,
-1.1474e+00, 3.8432e+00, 2.1757e+00, 2.8087e+00, 1.1324e+00,
3.5591e+00, 1.9089e+00, 2.4432e+00, -4.5529e-01, 1.0408e+00,
-5.0486e-01, -3.3511e-01, -1.2184e+00, -1.1848e+00, 1.5616e+00,
-1.2250e+00, 3.7889e-01, 1.7945e+00, -9.9117e-02, -7.6152e-03,
1.9710e-01, 8.2536e-01, -1.1791e+00, 5.1105e-01, 1.8944e+00,
4.7684e+00, 2.4500e+00, -3.4019e-01, 3.6311e-01, -6.6713e-01,
-3.0069e-01, 4.5596e-01, -2.8358e-01, 2.6556e-01, 1.8438e+00,
4.0064e-01, -1.5674e+00, 5.7787e-03, 1.2242e+00, 1.2593e-01,
-5.2277e-01, -9.8615e-01, -1.0612e+00, 1.7508e+00, -8.5462e-01,
1.6774e+00, 1.3741e-01, 1.6946e+00, -1.1201e+00, 6.7398e-01,
-1.1908e+00, -1.0301e+00, -3.7273e-01, -7.4300e-01, 8.1450e-02,
8.1920e-01, -4.9004e-01, 1.9640e+00, 2.2390e+00, 6.1085e-01,
2.5948e+00, -8.5221e-01, 1.7476e+00, 1.5593e+00, 6.3670e-01,
4.7895e+00, 3.2253e+00, 3.3895e+00, 3.1997e+00, 2.6604e-01,
-1.9937e+00, 1.5900e+00, 2.7293e-01, 1.4374e+00, -1.1999e-01,
7.7782e-01, 1.5181e-02, 1.1509e-01, -7.6229e-01, 2.3637e+00,
4.1866e+00, 5.0602e-01, 3.6513e-01, 2.6414e+00, -1.0411e+00,
1.9441e+00, 2.2214e+00, 1.2579e+00, 1.5240e+00, -1.1258e-01,
1.5084e-01, -1.4185e-01, 1.2559e+00, 1.1839e+00, 1.3838e+00,
4.6340e+00, -9.0865e-01, 4.0835e+00, 2.4878e+00, 2.5173e+00,
-1.9493e+00, 3.4721e+00, 4.0724e-01, -1.4598e+00, 2.7559e+00,
1.6428e+00, 2.1481e+00, 1.3823e+00, 1.8370e+00, -4.5120e-01,
1.8123e+00, 2.4906e+00, 1.5409e+00, -1.0159e+00, -1.8618e+00,
5.8097e-01, -2.8717e-01, 5.1455e-01, 8.0545e-01, 3.6588e+00,
1.3412e-01, 9.9009e-01, 2.8569e+00, 1.9259e+00, -7.2916e-01,
1.0044e+00, 1.9262e+00, 1.7313e-02, 2.1786e+00, -3.9642e-01,
2.7504e-01, -1.1991e-01, -6.7182e-01, 1.2963e+00, 1.9883e-01,
-1.0035e+00, -6.5229e-01, 4.1433e+00, -9.4811e-01, 1.8351e+00,
-9.4409e-01, 1.7877e+00, 1.5015e+00, -6.3148e-01, -4.0982e-01,
-2.3635e-02, -6.8984e-01, -1.3123e+00, 4.1120e+00, 2.7246e+00,
1.6324e-02, -5.0407e-01, 5.7172e-01, 2.0147e+00, -6.3177e-01,
6.3341e-01, 5.6590e+00, 6.1334e-02, 2.1875e-01, 1.4905e+00,
4.1078e-01, 3.2662e+00, -4.3793e-01, 1.7184e+00, -9.2182e-01,
-8.5871e-01, -4.3197e-01, 5.0464e+00, -2.4518e-01, -1.4605e+00,
9.4362e-01, 2.9023e+00, 6.4944e-01, -1.4676e-02, 3.1429e-01,
4.4461e+00, -7.1109e-01, -4.0138e-01, 2.0017e+00, 1.5661e+00,
-7.6096e-01, 1.8566e+00, 3.0987e+00, 5.9708e-01, 2.0955e+00,
2.4421e+00, 1.8260e+00, 7.0799e-01, 1.0659e+00, 1.4124e+00,
3.7122e-01, -6.4199e-01, -1.3770e-01, 6.3672e-01, 2.3187e+00,
2.9253e+00, 1.2384e+00, 2.1925e+00, 1.2610e+00, -1.1169e+00,
2.5535e+00, 9.9501e-01, 1.8191e-01, 2.1627e+00, 1.7116e+00,
-2.3068e+00, 2.1134e+00, 2.8885e+00, 1.1332e+00, 1.2188e-02,
-8.5331e-01, 7.1743e-01, 3.8555e-01, 2.5166e+00, 2.2196e+00,
2.9442e-02, -4.7403e-02, 1.8368e+00, 6.8140e-01, -1.4280e-01,
2.1131e+00, 8.8801e-01, 1.1169e+00, 1.8357e+00, 1.4696e+00,
1.2638e+00, -6.8029e-01, -3.2958e-01, 5.8167e-01, 3.2046e+00,
-4.1551e-01, 5.4914e-01, -1.3841e+00, 2.4618e-01, 1.8262e+00,
2.0605e+00, 4.1062e+00, 5.1896e+00, 1.3421e+00, 5.5525e-01,
1.3706e+00, 9.9714e-01, 3.2196e+00, -7.3169e-01, 1.5970e+00,
1.9575e+00, 6.7052e-01, 1.4960e+00, 3.5359e+00, 6.5257e-01,
-1.9448e+00, -1.6082e-01, 4.1609e-01, 2.1405e+00, -7.4468e-01,
-1.6838e+00, 4.4351e-02, 2.3610e+00, 6.3643e-01, 1.3752e+00,
9.2887e-01, 1.4374e+00, -2.5057e-01, 1.6678e+00, 1.2721e+00,
1.0991e+00, 3.6516e-01, 6.2042e-01, 1.2471e+00, 1.0038e+00,
-2.4707e-01, 2.3911e+00, 1.2827e+00, 2.3981e+00, 8.2473e-01,
8.7118e-01, -3.0461e-01, -7.4650e-01, -8.7416e-01, 9.8581e-01,
-1.3365e+00, 5.1204e-01, 1.1578e-01, 7.5739e-01, 2.7037e+00,
3.8529e+00, 1.9886e+00, 4.9845e-02, 1.6810e+00, -1.6452e+00,
-9.5338e-01, 8.3930e-02, -1.4824e-01, 7.9138e-01, -1.3337e-01,
-1.6955e+00, -3.7787e-01, 8.2839e-01, 2.1416e+00, 1.1960e+00,
-1.0223e+00, 2.1449e+00, 2.3191e+00, 2.2857e+00, 9.6810e-01,
-2.0137e-01, 2.7947e+00, -1.5959e+00, -1.7917e+00, 1.2072e+00,
-1.5412e-01, 3.0145e+00, 2.2277e+00, 2.0193e+00, -3.5791e-01,
1.4194e+00, 1.3186e+00, 1.4767e-01, -5.0973e-02, 1.5013e+00,
1.3330e+00, 1.3218e+00, -9.6431e-01, 3.3077e+00, 4.4780e+00,
3.0723e+00, 3.0282e+00, 3.6395e-01, -1.1868e-01, 1.9100e+00,
1.7480e+00, -1.5975e+00, 1.4965e-01, -5.2344e-02, 3.8128e+00,
6.2292e+00, 7.9314e-01, 2.2287e-01, 7.8432e-02, -8.6820e-01,
1.5580e+00, -2.0533e+00, 1.9541e-01, 4.1310e+00, 1.1829e+00,
7.2605e-02, -1.3189e+00, 1.6491e+00, -3.5762e-01, -4.5466e-01,
-8.1512e-01, 7.1880e-01, 1.0953e+00, 1.3817e+00, 5.3443e-01,
-4.9825e-01, 1.0563e+00, 7.2192e-01, 4.3681e-01, 8.1760e-02,
1.2844e+00, 1.5841e+00, -1.2133e+00, -5.1414e-01, 7.7775e-01,
-9.7600e-01, 2.0780e+00, 1.5780e+00, 2.6456e+00, 1.6863e-01,
-1.1976e+00, 1.3533e+00, 3.3104e-01, 3.1936e+00, 3.7815e+00,
-4.4944e-01, 2.2744e+00, 8.3756e-01, 8.1147e-01, 1.2852e+00,
1.0977e+00, 4.1722e-01, 1.2004e+00, -6.0595e-01, -3.4656e-01,
2.1500e+00, 1.1494e+00, -8.6472e-01, -2.1229e+00, -1.3426e+00,
-1.4216e+00, 3.6926e+00, 3.5461e+00, 6.3547e+00, 3.6496e+00,
1.2884e+00, 4.3123e+00, 3.8509e+00, 2.2235e+00, 7.3406e-01,
2.4890e+00, 1.2146e+00, 6.8909e-01, 1.3616e+00, 1.4953e+00,
1.1625e+00, 3.2800e+00, 3.0783e+00, 4.8876e-01, 2.7620e+00,
-9.4409e-01, 4.5483e-02, -4.6922e-01, -6.8508e-01, -7.1216e-01,
-9.2057e-01, -2.9971e-01, -1.2979e-02, 1.8758e-01, -1.0300e+00,
5.9650e-01, -1.7380e+00, -5.4800e-01, -1.4288e-02, 9.5569e-01,
1.6889e-01, 1.6327e-01, 3.0140e-01, 2.7652e-01, 9.7939e-01,
-3.0470e-01, 8.9789e-01, 3.2968e-01, -2.0730e+00, -1.0088e+00,
2.3379e+00, 7.1313e-01, -1.2382e+00, 7.7513e-01, 4.1961e-01,
1.3884e+00, 3.1139e+00, 7.7496e+00, 9.0270e+00, 2.8993e+00,
-8.0959e-01, 5.8377e-01, -2.2734e+00, -1.0872e+00, -8.1821e-01,
6.3957e-01, -1.7169e+00, -7.7390e-01, -1.2564e+00, -6.0100e-01,
-4.2064e-01, -4.4467e-01, 1.9411e-01, -1.6461e+00, -1.5729e+00,
-4.3694e-01, -3.2641e-01, -8.0312e-02, -7.7821e-01, -3.3205e-01,
3.9647e-01, -8.4661e-01, -3.9440e-01, -7.8064e-01, -5.5400e-01,
-9.1702e-01, 4.3964e-02, 1.2689e-01, 5.3765e-01, 3.3942e+00]],
grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
The classifier returns a ModelOutput object. The useful field is .logits — a tensor of shape [1, 1000] (one score per ImageNet class). Pulling it out into a top-level variable makes the rest of the post easier to read.
logits = outputs.logits
logitsOut[27]:
tensor([[-1.1288e+00, -5.5756e-01, -1.5567e+00, -1.3101e+00, -5.8843e-01,
-1.5344e+00, -1.7410e+00, -1.8962e+00, -1.8121e+00, -2.0072e+00,
-1.5723e+00, -1.7178e+00, -1.8666e+00, -1.7056e+00, -1.6707e+00,
-9.8119e-01, -6.7786e-01, -1.7026e+00, -1.6719e+00, -1.2583e+00,
-1.3638e+00, -1.1015e+00, -1.6210e+00, -1.5375e+00, -1.3533e+00,
-1.2028e+00, -8.9297e-02, -3.4615e-01, -6.2297e-01, -7.7859e-01,
-1.7597e+00, -1.0814e+00, -1.7800e+00, -2.0764e+00, -8.3730e-01,
-1.7238e+00, -2.6275e-01, -1.8172e+00, -9.1032e-01, -1.1190e+00,
-1.0625e+00, -9.2219e-01, -1.3375e+00, -1.3699e+00, -2.3913e-01,
-1.5224e+00, -1.1827e+00, -1.1833e+00, -1.0774e+00, -1.7358e+00,
-1.0469e+00, 8.9774e-02, -4.2469e-01, -1.2435e+00, -1.1599e+00,
-1.4301e+00, -1.0391e+00, -1.3216e+00, -1.4284e+00, -1.0975e+00,
-1.3484e+00, -1.0535e+00, -1.6283e+00, -1.3372e+00, -1.6257e+00,
-1.4261e+00, -1.8167e+00, -1.8975e+00, -8.3239e-01, -3.9849e-02,
-1.7841e+00, -7.3182e-01, -7.5883e-01, -5.2622e-01, -1.0596e+00,
-8.3641e-01, -1.6568e+00, -1.5369e+00, -1.3250e+00, -5.1549e-01,
-1.3927e+00, -2.1316e+00, -2.8735e+00, -2.4314e+00, -1.4067e+00,
-2.6360e+00, -2.8996e+00, 9.7903e-01, -9.8650e-01, -9.2010e-01,
-1.6485e+00, -2.1602e+00, -1.8970e+00, -2.3107e+00, -8.5293e-01,
-1.1307e+00, -2.0294e+00, -1.3849e+00, -8.5414e-01, -1.1081e+00,
-7.7380e-01, -9.0096e-01, -5.7366e-01, -1.7009e-01, -8.7846e-01,
-1.0487e+00, -1.5573e-01, -2.0009e-01, -1.0339e+00, -9.1640e-01,
-1.5770e+00, -1.2183e+00, -4.6172e-01, -1.1566e+00, 6.1680e-02,
-8.3394e-01, -1.7371e+00, 9.0266e-02, 6.6352e-01, -9.2439e-01,
-2.0220e+00, -8.9493e-01, -1.5144e-01, -1.8428e+00, -6.9276e-01,
-1.5316e+00, -3.5799e-01, -1.8749e+00, -1.4881e+00, -1.6213e+00,
-1.3232e+00, -1.7723e+00, -7.9047e-01, -1.9146e+00, -1.1945e+00,
-6.3094e-01, -1.7093e+00, -1.6797e-01, -2.1087e+00, -2.3179e+00,
-1.9785e+00, -1.4969e+00, -1.8115e+00, -2.1166e+00, -1.6588e+00,
-1.0613e+00, -5.0492e-01, -2.1767e+00, -1.6234e+00, -1.1036e+00,
-1.4370e+00, 8.1056e-01, -1.0474e+00, -4.1610e-01, -8.2913e-01,
-9.5349e-01, -9.5689e-01, -2.5514e-01, -2.5555e-01, -1.0438e+00,
-8.7707e-01, -1.4423e+00, -1.6613e+00, -1.3774e+00, -4.5691e-01,
-1.8336e+00, -2.2308e+00, -1.8402e+00, -6.0270e-01, -1.7237e+00,
-1.6384e+00, -4.4052e-01, -1.8071e+00, -1.5352e+00, -1.3887e+00,
-2.2053e+00, -1.1253e+00, -1.3287e+00, -3.6203e-01, -6.7451e-01,
-2.0837e+00, -1.0458e+00, -1.5430e+00, -5.7658e-01, -1.1472e+00,
2.0212e-01, -2.6233e-01, -7.8850e-01, -1.4344e+00, -6.7024e-01,
-9.5009e-01, -2.2901e+00, -1.6378e+00, -5.6170e-01, -1.5876e+00,
5.0793e-02, -1.3269e+00, -1.8811e+00, -1.1867e+00, -8.0472e-01,
-1.2382e+00, -5.5780e-02, -1.4564e+00, -1.3538e+00, -1.5029e-01,
-7.2923e-01, -8.1718e-01, -1.2432e+00, -1.9306e+00, -8.0074e-01,
-7.8810e-01, -5.5854e-01, -1.3028e+00, -1.3805e+00, -1.0876e+00,
-1.2772e+00, -1.7482e+00, -1.0466e+00, -1.7972e+00, -1.3454e+00,
-2.7756e-01, -1.0651e+00, -1.3511e+00, -1.0626e+00, -1.3594e+00,
-1.8190e+00, -1.3949e+00, -2.1168e+00, -1.5986e-01, -1.9322e+00,
-1.9654e+00, -2.2672e+00, -1.2830e+00, -2.0853e+00, -2.9499e+00,
-1.3882e+00, -6.3462e-01, -7.2214e-01, -9.9601e-01, -1.3956e+00,
-1.2745e+00, -1.3796e+00, -1.3912e+00, -9.7312e-01, -1.0223e+00,
-5.4940e-01, -8.0601e-01, -1.0212e+00, -8.6111e-01, -1.1427e+00,
-1.3147e+00, -5.0085e-01, -6.9131e-01, -8.1273e-01, 9.6982e-02,
-4.9907e-01, -6.5999e-01, -1.3624e+00, -5.2018e-01, -1.5573e+00,
-1.3135e+00, -2.3948e+00, -1.3010e+00, -5.0329e-01, -9.7687e-01,
6.8204e-02, -8.9449e-02, -9.4618e-01, -8.8762e-01, -1.6033e+00,
-1.8124e+00, -2.1654e+00, -2.1438e+00, -9.1164e-01, -1.8081e+00,
-2.6824e+00, -2.2461e+00, -2.8404e+00, -2.2389e+00, -2.3717e+00,
-2.7880e+00, -1.7697e+00, -1.7985e+00, -1.3684e+00, 1.0784e-01,
3.8752e-01, -2.0528e+00, -1.5288e+00, -1.5552e+00, -1.8982e+00,
-1.4148e+00, -1.9727e+00, -1.4251e+00, -1.4619e+00, -1.1224e+00,
-1.6169e+00, -1.6196e+00, -1.4527e+00, -1.3702e+00, -1.1338e+00,
-9.2544e-01, -1.4009e+00, -6.4955e-01, -1.5494e+00, -1.9337e+00,
-1.2910e+00, 4.8402e-01, -1.0485e+00, -1.8810e+00, -2.0764e+00,
-9.0217e-01, -1.5040e+00, -1.5439e+00, -1.9934e+00, -1.9208e-01,
-2.2644e+00, -2.1138e+00, -1.7672e+00, -1.0737e+00, -1.4061e+00,
-1.5477e+00, -1.6711e+00, -1.0581e+00, -1.5613e+00, -7.4515e-01,
-8.8589e-01, -7.3351e-01, -6.6812e-02, -7.4479e-02, -4.7825e-01,
-2.5402e+00, -2.9542e+00, -2.0981e+00, -1.1647e+00, -1.1550e+00,
-1.2923e+00, -2.2555e+00, -1.3204e+00, -1.9082e+00, -1.1163e+00,
-1.1351e+00, -2.0877e+00, -1.9385e+00, -2.0916e+00, -1.7080e+00,
-1.8356e+00, -1.5045e+00, -1.5071e+00, -2.1718e+00, -1.5874e+00,
-1.7591e-01, -1.1006e+00, -1.3475e+00, -2.5205e-01, -1.1786e+00,
-1.8891e+00, -1.0950e+00, -1.5001e+00, -5.7034e-01, -6.3320e-01,
-1.0726e+00, -1.5441e+00, -1.1395e+00, 1.4060e-01, -6.0951e-01,
-9.7084e-01, -1.5965e+00, -1.6291e+00, -2.2506e+00, -1.8121e+00,
-1.2935e+00, -1.9496e+00, -2.2894e+00, -1.7002e+00, -2.0452e+00,
-1.3958e+00, -7.1474e-01, -1.8174e+00, -1.7262e+00, -1.7804e+00,
-7.1823e-01, -1.5532e+00, -1.5278e+00, -1.6071e+00, -1.4682e+00,
-1.8455e+00, -2.0240e+00, -9.4857e-01, -1.2190e+00, -9.1378e-01,
-1.1949e+00, -6.0205e-01, -1.6749e+00, -1.7124e+00, -4.3164e-01,
-9.6829e-01, -1.0952e+00, -7.9104e-01, 2.3104e+00, 1.0714e+00,
1.0450e+00, 1.4213e+00, 1.0278e+00, -6.5385e-01, -3.5740e-01,
-6.7773e-01, 3.5781e-01, -1.0136e+00, -1.7345e+00, 3.0628e+00,
-1.7585e+00, -2.2044e-01, 3.5266e+00, -2.1260e-01, 1.4028e+00,
4.5782e+00, -7.8668e-01, 5.2189e-01, 2.8467e+00, 2.9685e+00,
3.9798e-01, 1.7613e-01, -8.7818e-01, 7.2730e-01, 1.4216e+00,
-1.3559e+00, 9.1536e-01, 1.9479e+00, 4.9609e-02, 7.9663e-01,
-1.0046e+00, 4.4092e-01, 9.0080e-01, 1.2989e+00, -1.0287e+00,
6.5570e-01, -1.0619e+00, -6.2818e-01, 1.9653e+00, 1.4845e+00,
1.1694e+00, 3.6230e+00, 4.2247e-01, 2.3146e-01, -9.8158e-01,
-1.4154e+00, 1.5514e+00, 1.1212e+00, -7.9226e-01, -5.8781e-01,
2.0116e-01, 2.8734e-01, 2.8798e-01, 5.3292e-01, 3.6066e+00,
1.2094e+00, -5.1846e-01, 1.5131e+00, 8.4088e-01, -9.2419e-02,
-5.0970e-01, 1.3946e+00, -1.4535e-02, 3.0267e+00, 1.9495e+00,
7.7145e-01, -2.9291e-01, 9.9016e-02, 4.7988e-01, 1.3111e+00,
3.1897e+00, -5.4317e-01, -1.3991e+00, 2.8718e+00, -1.9368e-01,
4.3121e-01, -5.2684e-02, 6.9063e-01, 1.6148e+00, -4.4517e-01,
2.7687e+00, 2.8848e+00, 2.1599e+00, -5.1245e-01, -1.1066e+00,
2.8624e+00, -3.7812e-01, 6.3851e+00, 2.7377e-01, -8.1250e-01,
3.1924e-01, 1.4829e-01, 3.7285e-01, -1.2799e-01, -1.0288e+00,
7.5215e-01, 4.5020e-01, -2.4231e-01, 2.2576e+00, 3.6101e+00,
-2.3716e+00, 9.9774e-01, 2.5722e+00, 1.9629e+00, 1.0951e+01,
5.7500e+00, -2.2427e-01, 1.7355e+00, 2.3874e+00, 1.1408e+00,
-1.8416e+00, -5.5941e-01, 1.7484e+00, -7.2259e-01, 1.7906e+00,
1.8903e+00, -1.9706e-01, -1.6067e-03, 1.5667e+00, 1.3664e+00,
7.5336e-01, 2.6782e+00, 5.7073e-02, 1.4663e+00, 2.1284e-01,
-1.1474e+00, 3.8432e+00, 2.1757e+00, 2.8087e+00, 1.1324e+00,
3.5591e+00, 1.9089e+00, 2.4432e+00, -4.5529e-01, 1.0408e+00,
-5.0486e-01, -3.3511e-01, -1.2184e+00, -1.1848e+00, 1.5616e+00,
-1.2250e+00, 3.7889e-01, 1.7945e+00, -9.9117e-02, -7.6152e-03,
1.9710e-01, 8.2536e-01, -1.1791e+00, 5.1105e-01, 1.8944e+00,
4.7684e+00, 2.4500e+00, -3.4019e-01, 3.6311e-01, -6.6713e-01,
-3.0069e-01, 4.5596e-01, -2.8358e-01, 2.6556e-01, 1.8438e+00,
4.0064e-01, -1.5674e+00, 5.7787e-03, 1.2242e+00, 1.2593e-01,
-5.2277e-01, -9.8615e-01, -1.0612e+00, 1.7508e+00, -8.5462e-01,
1.6774e+00, 1.3741e-01, 1.6946e+00, -1.1201e+00, 6.7398e-01,
-1.1908e+00, -1.0301e+00, -3.7273e-01, -7.4300e-01, 8.1450e-02,
8.1920e-01, -4.9004e-01, 1.9640e+00, 2.2390e+00, 6.1085e-01,
2.5948e+00, -8.5221e-01, 1.7476e+00, 1.5593e+00, 6.3670e-01,
4.7895e+00, 3.2253e+00, 3.3895e+00, 3.1997e+00, 2.6604e-01,
-1.9937e+00, 1.5900e+00, 2.7293e-01, 1.4374e+00, -1.1999e-01,
7.7782e-01, 1.5181e-02, 1.1509e-01, -7.6229e-01, 2.3637e+00,
4.1866e+00, 5.0602e-01, 3.6513e-01, 2.6414e+00, -1.0411e+00,
1.9441e+00, 2.2214e+00, 1.2579e+00, 1.5240e+00, -1.1258e-01,
1.5084e-01, -1.4185e-01, 1.2559e+00, 1.1839e+00, 1.3838e+00,
4.6340e+00, -9.0865e-01, 4.0835e+00, 2.4878e+00, 2.5173e+00,
-1.9493e+00, 3.4721e+00, 4.0724e-01, -1.4598e+00, 2.7559e+00,
1.6428e+00, 2.1481e+00, 1.3823e+00, 1.8370e+00, -4.5120e-01,
1.8123e+00, 2.4906e+00, 1.5409e+00, -1.0159e+00, -1.8618e+00,
5.8097e-01, -2.8717e-01, 5.1455e-01, 8.0545e-01, 3.6588e+00,
1.3412e-01, 9.9009e-01, 2.8569e+00, 1.9259e+00, -7.2916e-01,
1.0044e+00, 1.9262e+00, 1.7313e-02, 2.1786e+00, -3.9642e-01,
2.7504e-01, -1.1991e-01, -6.7182e-01, 1.2963e+00, 1.9883e-01,
-1.0035e+00, -6.5229e-01, 4.1433e+00, -9.4811e-01, 1.8351e+00,
-9.4409e-01, 1.7877e+00, 1.5015e+00, -6.3148e-01, -4.0982e-01,
-2.3635e-02, -6.8984e-01, -1.3123e+00, 4.1120e+00, 2.7246e+00,
1.6324e-02, -5.0407e-01, 5.7172e-01, 2.0147e+00, -6.3177e-01,
6.3341e-01, 5.6590e+00, 6.1334e-02, 2.1875e-01, 1.4905e+00,
4.1078e-01, 3.2662e+00, -4.3793e-01, 1.7184e+00, -9.2182e-01,
-8.5871e-01, -4.3197e-01, 5.0464e+00, -2.4518e-01, -1.4605e+00,
9.4362e-01, 2.9023e+00, 6.4944e-01, -1.4676e-02, 3.1429e-01,
4.4461e+00, -7.1109e-01, -4.0138e-01, 2.0017e+00, 1.5661e+00,
-7.6096e-01, 1.8566e+00, 3.0987e+00, 5.9708e-01, 2.0955e+00,
2.4421e+00, 1.8260e+00, 7.0799e-01, 1.0659e+00, 1.4124e+00,
3.7122e-01, -6.4199e-01, -1.3770e-01, 6.3672e-01, 2.3187e+00,
2.9253e+00, 1.2384e+00, 2.1925e+00, 1.2610e+00, -1.1169e+00,
2.5535e+00, 9.9501e-01, 1.8191e-01, 2.1627e+00, 1.7116e+00,
-2.3068e+00, 2.1134e+00, 2.8885e+00, 1.1332e+00, 1.2188e-02,
-8.5331e-01, 7.1743e-01, 3.8555e-01, 2.5166e+00, 2.2196e+00,
2.9442e-02, -4.7403e-02, 1.8368e+00, 6.8140e-01, -1.4280e-01,
2.1131e+00, 8.8801e-01, 1.1169e+00, 1.8357e+00, 1.4696e+00,
1.2638e+00, -6.8029e-01, -3.2958e-01, 5.8167e-01, 3.2046e+00,
-4.1551e-01, 5.4914e-01, -1.3841e+00, 2.4618e-01, 1.8262e+00,
2.0605e+00, 4.1062e+00, 5.1896e+00, 1.3421e+00, 5.5525e-01,
1.3706e+00, 9.9714e-01, 3.2196e+00, -7.3169e-01, 1.5970e+00,
1.9575e+00, 6.7052e-01, 1.4960e+00, 3.5359e+00, 6.5257e-01,
-1.9448e+00, -1.6082e-01, 4.1609e-01, 2.1405e+00, -7.4468e-01,
-1.6838e+00, 4.4351e-02, 2.3610e+00, 6.3643e-01, 1.3752e+00,
9.2887e-01, 1.4374e+00, -2.5057e-01, 1.6678e+00, 1.2721e+00,
1.0991e+00, 3.6516e-01, 6.2042e-01, 1.2471e+00, 1.0038e+00,
-2.4707e-01, 2.3911e+00, 1.2827e+00, 2.3981e+00, 8.2473e-01,
8.7118e-01, -3.0461e-01, -7.4650e-01, -8.7416e-01, 9.8581e-01,
-1.3365e+00, 5.1204e-01, 1.1578e-01, 7.5739e-01, 2.7037e+00,
3.8529e+00, 1.9886e+00, 4.9845e-02, 1.6810e+00, -1.6452e+00,
-9.5338e-01, 8.3930e-02, -1.4824e-01, 7.9138e-01, -1.3337e-01,
-1.6955e+00, -3.7787e-01, 8.2839e-01, 2.1416e+00, 1.1960e+00,
-1.0223e+00, 2.1449e+00, 2.3191e+00, 2.2857e+00, 9.6810e-01,
-2.0137e-01, 2.7947e+00, -1.5959e+00, -1.7917e+00, 1.2072e+00,
-1.5412e-01, 3.0145e+00, 2.2277e+00, 2.0193e+00, -3.5791e-01,
1.4194e+00, 1.3186e+00, 1.4767e-01, -5.0973e-02, 1.5013e+00,
1.3330e+00, 1.3218e+00, -9.6431e-01, 3.3077e+00, 4.4780e+00,
3.0723e+00, 3.0282e+00, 3.6395e-01, -1.1868e-01, 1.9100e+00,
1.7480e+00, -1.5975e+00, 1.4965e-01, -5.2344e-02, 3.8128e+00,
6.2292e+00, 7.9314e-01, 2.2287e-01, 7.8432e-02, -8.6820e-01,
1.5580e+00, -2.0533e+00, 1.9541e-01, 4.1310e+00, 1.1829e+00,
7.2605e-02, -1.3189e+00, 1.6491e+00, -3.5762e-01, -4.5466e-01,
-8.1512e-01, 7.1880e-01, 1.0953e+00, 1.3817e+00, 5.3443e-01,
-4.9825e-01, 1.0563e+00, 7.2192e-01, 4.3681e-01, 8.1760e-02,
1.2844e+00, 1.5841e+00, -1.2133e+00, -5.1414e-01, 7.7775e-01,
-9.7600e-01, 2.0780e+00, 1.5780e+00, 2.6456e+00, 1.6863e-01,
-1.1976e+00, 1.3533e+00, 3.3104e-01, 3.1936e+00, 3.7815e+00,
-4.4944e-01, 2.2744e+00, 8.3756e-01, 8.1147e-01, 1.2852e+00,
1.0977e+00, 4.1722e-01, 1.2004e+00, -6.0595e-01, -3.4656e-01,
2.1500e+00, 1.1494e+00, -8.6472e-01, -2.1229e+00, -1.3426e+00,
-1.4216e+00, 3.6926e+00, 3.5461e+00, 6.3547e+00, 3.6496e+00,
1.2884e+00, 4.3123e+00, 3.8509e+00, 2.2235e+00, 7.3406e-01,
2.4890e+00, 1.2146e+00, 6.8909e-01, 1.3616e+00, 1.4953e+00,
1.1625e+00, 3.2800e+00, 3.0783e+00, 4.8876e-01, 2.7620e+00,
-9.4409e-01, 4.5483e-02, -4.6922e-01, -6.8508e-01, -7.1216e-01,
-9.2057e-01, -2.9971e-01, -1.2979e-02, 1.8758e-01, -1.0300e+00,
5.9650e-01, -1.7380e+00, -5.4800e-01, -1.4288e-02, 9.5569e-01,
1.6889e-01, 1.6327e-01, 3.0140e-01, 2.7652e-01, 9.7939e-01,
-3.0470e-01, 8.9789e-01, 3.2968e-01, -2.0730e+00, -1.0088e+00,
2.3379e+00, 7.1313e-01, -1.2382e+00, 7.7513e-01, 4.1961e-01,
1.3884e+00, 3.1139e+00, 7.7496e+00, 9.0270e+00, 2.8993e+00,
-8.0959e-01, 5.8377e-01, -2.2734e+00, -1.0872e+00, -8.1821e-01,
6.3957e-01, -1.7169e+00, -7.7390e-01, -1.2564e+00, -6.0100e-01,
-4.2064e-01, -4.4467e-01, 1.9411e-01, -1.6461e+00, -1.5729e+00,
-4.3694e-01, -3.2641e-01, -8.0312e-02, -7.7821e-01, -3.3205e-01,
3.9647e-01, -8.4661e-01, -3.9440e-01, -7.8064e-01, -5.5400e-01,
-9.1702e-01, 4.3964e-02, 1.2689e-01, 5.3765e-01, 3.3942e+00]],
grad_fn=<AddmmBackward0>)
From raw logits to a top-1 prediction
The classifier returns raw logits — one float per ImageNet class. The model's "answer" is the class with the highest logit. The next two cells grab that index and translate it into a label.
Model Prediction
predicted_class_idx = logits.argmax(-1).item()
predicted_class_idxOut[28]:
504
The argmax returned an integer class index; the id2label dict translates that integer to a human-readable ImageNet class name.
classifier.config.id2label[predicted_class_idx]Out[29]:
'coffee mug'
Before reading the top-5 prediction, it helps to know what the model is actually made of. The cells below poke at the feature_processor and the classifier to confirm we are running the architecture we think we are.
The handful of inspection cells that follow are checkpoints, not building blocks — feel free to skim past them if the rest of the post is already making sense.
Configuration and Shapes
feature_processorOut[30]:
ViTImageProcessor {
"do_normalize": true,
"do_rescale": true,
"do_resize": true,
"image_mean": [
0.5,
0.5,
0.5
],
"image_processor_type": "ViTImageProcessor",
"image_std": [
0.5,
0.5,
0.5
],
"resample": 2,
"rescale_factor": 0.00392156862745098,
"size": {
"height": 224,
"width": 224
}
}
Second checkpoint: confirm the tensor feeding the classifier has the [1, 3, 224, 224] shape the model expects. If the shape is off, the conv stack will fail with a dimension mismatch.
inputs["pixel_values"].shapeOut[31]:
torch.Size([1, 3, 224, 224])
Third checkpoint: the classifier is a 12-layer ViT with a Conv2d(3, 768, kernel_size=16, stride=16) patch embedder, a hidden size of 768, and a 1000-class classification head. This is the model that will produce the logits.
classifierOut[32]:
ViTForImageClassification(
(vit): ViTModel(
(embeddings): ViTEmbeddings(
(patch_embeddings): ViTPatchEmbeddings(
(projection): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))
)
(dropout): Dropout(p=0.0, inplace=False)
)
(layers): ModuleList(
(0-11): 12 x ViTLayer(
(attention): ViTAttention(
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(o_proj): Linear(in_features=768, out_features=768, bias=True)
)
(layernorm_before): LayerNorm((768,), eps=1e-12, elementwise_affine=True, bias=True)
(layernorm_after): LayerNorm((768,), eps=1e-12, elementwise_affine=True, bias=True)
(mlp): ViTMLP(
(activation_fn): GELUActivation()
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
)
(dropout): Dropout(p=0.0, inplace=False)
)
)
(layernorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True, bias=True)
)
(classifier): Linear(in_features=768, out_features=1000, bias=True)
)
Fourth checkpoint: the output of a forward pass on a single image is [1, 1000] — one logit per ImageNet class. The leading 1 is the batch dimension.
outputs.logits.shapeOut[33]:
torch.Size([1, 1000])
Fifth checkpoint: id2label is the dict we use to translate class indices into human-readable names. There are 1,000 entries covering the ImageNet-1k taxonomy.
classifier.config.id2labelOut[34]:
{0: 'tench, Tinca tinca',
1: 'goldfish, Carassius auratus',
2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
3: 'tiger shark, Galeocerdo cuvieri',
4: 'hammerhead, hammerhead shark',
5: 'electric ray, crampfish, numbfish, torpedo',
6: 'stingray',
7: 'cock',
8: 'hen',
9: 'ostrich, Struthio camelus',
10: 'brambling, Fringilla montifringilla',
11: 'goldfinch, Carduelis carduelis',
12: 'house finch, linnet, Carpodacus mexicanus',
13: 'junco, snowbird',
14: 'indigo bunting, indigo finch, indigo bird, Passerina cyanea',
15: 'robin, American robin, Turdus migratorius',
16: 'bulbul',
17: 'jay',
18: 'magpie',
19: 'chickadee',
20: 'water ouzel, dipper',
21: 'kite',
22: 'bald eagle, American eagle, Haliaeetus leucocephalus',
23: 'vulture',
24: 'great grey owl, great gray owl, Strix nebulosa',
25: 'European fire salamander, Salamandra salamandra',
26: 'common newt, Triturus vulgaris',
27: 'eft',
28: 'spotted salamander, Ambystoma maculatum',
29: 'axolotl, mud puppy, Ambystoma mexicanum',
30: 'bullfrog, Rana catesbeiana',
31: 'tree frog, tree-frog',
32: 'tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui',
33: 'loggerhead, loggerhead turtle, Caretta caretta',
34: 'leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea',
35: 'mud turtle',
36: 'terrapin',
37: 'box turtle, box tortoise',
38: 'banded gecko',
39: 'common iguana, iguana, Iguana iguana',
40: 'American chameleon, anole, Anolis carolinensis',
41: 'whiptail, whiptail lizard',
42: 'agama',
43: 'frilled lizard, Chlamydosaurus kingi',
44: 'alligator lizard',
45: 'Gila monster, Heloderma suspectum',
46: 'green lizard, Lacerta viridis',
47: 'African chameleon, Chamaeleo chamaeleon',
48: 'Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis',
49: 'African crocodile, Nile crocodile, Crocodylus niloticus',
50: 'American alligator, Alligator mississipiensis',
51: 'triceratops',
52: 'thunder snake, worm snake, Carphophis amoenus',
53: 'ringneck snake, ring-necked snake, ring snake',
54: 'hognose snake, puff adder, sand viper',
55: 'green snake, grass snake',
56: 'king snake, kingsnake',
57: 'garter snake, grass snake',
58: 'water snake',
59: 'vine snake',
60: 'night snake, Hypsiglena torquata',
61: 'boa constrictor, Constrictor constrictor',
62: 'rock python, rock snake, Python sebae',
63: 'Indian cobra, Naja naja',
64: 'green mamba',
65: 'sea snake',
66: 'horned viper, cerastes, sand viper, horned asp, Cerastes cornutus',
67: 'diamondback, diamondback rattlesnake, Crotalus adamanteus',
68: 'sidewinder, horned rattlesnake, Crotalus cerastes',
69: 'trilobite',
70: 'harvestman, daddy longlegs, Phalangium opilio',
71: 'scorpion',
72: 'black and gold garden spider, Argiope aurantia',
73: 'barn spider, Araneus cavaticus',
74: 'garden spider, Aranea diademata',
75: 'black widow, Latrodectus mactans',
76: 'tarantula',
77: 'wolf spider, hunting spider',
78: 'tick',
79: 'centipede',
80: 'black grouse',
81: 'ptarmigan',
82: 'ruffed grouse, partridge, Bonasa umbellus',
83: 'prairie chicken, prairie grouse, prairie fowl',
84: 'peacock',
85: 'quail',
86: 'partridge',
87: 'African grey, African gray, Psittacus erithacus',
88: 'macaw',
89: 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita',
90: 'lorikeet',
91: 'coucal',
92: 'bee eater',
93: 'hornbill',
94: 'hummingbird',
95: 'jacamar',
96: 'toucan',
97: 'drake',
98: 'red-breasted merganser, Mergus serrator',
99: 'goose',
100: 'black swan, Cygnus atratus',
101: 'tusker',
102: 'echidna, spiny anteater, anteater',
103: 'platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus',
104: 'wallaby, brush kangaroo',
105: 'koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus',
106: 'wombat',
107: 'jellyfish',
108: 'sea anemone, anemone',
109: 'brain coral',
110: 'flatworm, platyhelminth',
111: 'nematode, nematode worm, roundworm',
112: 'conch',
113: 'snail',
114: 'slug',
115: 'sea slug, nudibranch',
116: 'chiton, coat-of-mail shell, sea cradle, polyplacophore',
117: 'chambered nautilus, pearly nautilus, nautilus',
118: 'Dungeness crab, Cancer magister',
119: 'rock crab, Cancer irroratus',
120: 'fiddler crab',
121: 'king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica',
122: 'American lobster, Northern lobster, Maine lobster, Homarus americanus',
123: 'spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish',
124: 'crayfish, crawfish, crawdad, crawdaddy',
125: 'hermit crab',
126: 'isopod',
127: 'white stork, Ciconia ciconia',
128: 'black stork, Ciconia nigra',
129: 'spoonbill',
130: 'flamingo',
131: 'little blue heron, Egretta caerulea',
132: 'American egret, great white heron, Egretta albus',
133: 'bittern',
134: 'crane',
135: 'limpkin, Aramus pictus',
136: 'European gallinule, Porphyrio porphyrio',
137: 'American coot, marsh hen, mud hen, water hen, Fulica americana',
138: 'bustard',
139: 'ruddy turnstone, Arenaria interpres',
140: 'red-backed sandpiper, dunlin, Erolia alpina',
141: 'redshank, Tringa totanus',
142: 'dowitcher',
143: 'oystercatcher, oyster catcher',
144: 'pelican',
145: 'king penguin, Aptenodytes patagonica',
146: 'albatross, mollymawk',
147: 'grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus',
148: 'killer whale, killer, orca, grampus, sea wolf, Orcinus orca',
149: 'dugong, Dugong dugon',
150: 'sea lion',
151: 'Chihuahua',
152: 'Japanese spaniel',
153: 'Maltese dog, Maltese terrier, Maltese',
154: 'Pekinese, Pekingese, Peke',
155: 'Shih-Tzu',
156: 'Blenheim spaniel',
157: 'papillon',
158: 'toy terrier',
159: 'Rhodesian ridgeback',
160: 'Afghan hound, Afghan',
161: 'basset, basset hound',
162: 'beagle',
163: 'bloodhound, sleuthhound',
164: 'bluetick',
165: 'black-and-tan coonhound',
166: 'Walker hound, Walker foxhound',
167: 'English foxhound',
168: 'redbone',
169: 'borzoi, Russian wolfhound',
170: 'Irish wolfhound',
171: 'Italian greyhound',
172: 'whippet',
173: 'Ibizan hound, Ibizan Podenco',
174: 'Norwegian elkhound, elkhound',
175: 'otterhound, otter hound',
176: 'Saluki, gazelle hound',
177: 'Scottish deerhound, deerhound',
178: 'Weimaraner',
179: 'Staffordshire bullterrier, Staffordshire bull terrier',
180: 'American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier',
181: 'Bedlington terrier',
182: 'Border terrier',
183: 'Kerry blue terrier',
184: 'Irish terrier',
185: 'Norfolk terrier',
186: 'Norwich terrier',
187: 'Yorkshire terrier',
188: 'wire-haired fox terrier',
189: 'Lakeland terrier',
190: 'Sealyham terrier, Sealyham',
191: 'Airedale, Airedale terrier',
192: 'cairn, cairn terrier',
193: 'Australian terrier',
194: 'Dandie Dinmont, Dandie Dinmont terrier',
195: 'Boston bull, Boston terrier',
196: 'miniature schnauzer',
197: 'giant schnauzer',
198: 'standard schnauzer',
199: 'Scotch terrier, Scottish terrier, Scottie',
200: 'Tibetan terrier, chrysanthemum dog',
201: 'silky terrier, Sydney silky',
202: 'soft-coated wheaten terrier',
203: 'West Highland white terrier',
204: 'Lhasa, Lhasa apso',
205: 'flat-coated retriever',
206: 'curly-coated retriever',
207: 'golden retriever',
208: 'Labrador retriever',
209: 'Chesapeake Bay retriever',
210: 'German short-haired pointer',
211: 'vizsla, Hungarian pointer',
212: 'English setter',
213: 'Irish setter, red setter',
214: 'Gordon setter',
215: 'Brittany spaniel',
216: 'clumber, clumber spaniel',
217: 'English springer, English springer spaniel',
218: 'Welsh springer spaniel',
219: 'cocker spaniel, English cocker spaniel, cocker',
220: 'Sussex spaniel',
221: 'Irish water spaniel',
222: 'kuvasz',
223: 'schipperke',
224: 'groenendael',
225: 'malinois',
226: 'briard',
227: 'kelpie',
228: 'komondor',
229: 'Old English sheepdog, bobtail',
230: 'Shetland sheepdog, Shetland sheep dog, Shetland',
231: 'collie',
232: 'Border collie',
233: 'Bouvier des Flandres, Bouviers des Flandres',
234: 'Rottweiler',
235: 'German shepherd, German shepherd dog, German police dog, alsatian',
236: 'Doberman, Doberman pinscher',
237: 'miniature pinscher',
238: 'Greater Swiss Mountain dog',
239: 'Bernese mountain dog',
240: 'Appenzeller',
241: 'EntleBucher',
242: 'boxer',
243: 'bull mastiff',
244: 'Tibetan mastiff',
245: 'French bulldog',
246: 'Great Dane',
247: 'Saint Bernard, St Bernard',
248: 'Eskimo dog, husky',
249: 'malamute, malemute, Alaskan malamute',
250: 'Siberian husky',
251: 'dalmatian, coach dog, carriage dog',
252: 'affenpinscher, monkey pinscher, monkey dog',
253: 'basenji',
254: 'pug, pug-dog',
255: 'Leonberg',
256: 'Newfoundland, Newfoundland dog',
257: 'Great Pyrenees',
258: 'Samoyed, Samoyede',
259: 'Pomeranian',
260: 'chow, chow chow',
261: 'keeshond',
262: 'Brabancon griffon',
263: 'Pembroke, Pembroke Welsh corgi',
264: 'Cardigan, Cardigan Welsh corgi',
265: 'toy poodle',
266: 'miniature poodle',
267: 'standard poodle',
268: 'Mexican hairless',
269: 'timber wolf, grey wolf, gray wolf, Canis lupus',
270: 'white wolf, Arctic wolf, Canis lupus tundrarum',
271: 'red wolf, maned wolf, Canis rufus, Canis niger',
272: 'coyote, prairie wolf, brush wolf, Canis latrans',
273: 'dingo, warrigal, warragal, Canis dingo',
274: 'dhole, Cuon alpinus',
275: 'African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus',
276: 'hyena, hyaena',
277: 'red fox, Vulpes vulpes',
278: 'kit fox, Vulpes macrotis',
279: 'Arctic fox, white fox, Alopex lagopus',
280: 'grey fox, gray fox, Urocyon cinereoargenteus',
281: 'tabby, tabby cat',
282: 'tiger cat',
283: 'Persian cat',
284: 'Siamese cat, Siamese',
285: 'Egyptian cat',
286: 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor',
287: 'lynx, catamount',
288: 'leopard, Panthera pardus',
289: 'snow leopard, ounce, Panthera uncia',
290: 'jaguar, panther, Panthera onca, Felis onca',
291: 'lion, king of beasts, Panthera leo',
292: 'tiger, Panthera tigris',
293: 'cheetah, chetah, Acinonyx jubatus',
294: 'brown bear, bruin, Ursus arctos',
295: 'American black bear, black bear, Ursus americanus, Euarctos americanus',
296: 'ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus',
297: 'sloth bear, Melursus ursinus, Ursus ursinus',
298: 'mongoose',
299: 'meerkat, mierkat',
300: 'tiger beetle',
301: 'ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle',
302: 'ground beetle, carabid beetle',
303: 'long-horned beetle, longicorn, longicorn beetle',
304: 'leaf beetle, chrysomelid',
305: 'dung beetle',
306: 'rhinoceros beetle',
307: 'weevil',
308: 'fly',
309: 'bee',
310: 'ant, emmet, pismire',
311: 'grasshopper, hopper',
312: 'cricket',
313: 'walking stick, walkingstick, stick insect',
314: 'cockroach, roach',
315: 'mantis, mantid',
316: 'cicada, cicala',
317: 'leafhopper',
318: 'lacewing, lacewing fly',
319: "dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk",
320: 'damselfly',
321: 'admiral',
322: 'ringlet, ringlet butterfly',
323: 'monarch, monarch butterfly, milkweed butterfly, Danaus plexippus',
324: 'cabbage butterfly',
325: 'sulphur butterfly, sulfur butterfly',
326: 'lycaenid, lycaenid butterfly',
327: 'starfish, sea star',
328: 'sea urchin',
329: 'sea cucumber, holothurian',
330: 'wood rabbit, cottontail, cottontail rabbit',
331: 'hare',
332: 'Angora, Angora rabbit',
333: 'hamster',
334: 'porcupine, hedgehog',
335: 'fox squirrel, eastern fox squirrel, Sciurus niger',
336: 'marmot',
337: 'beaver',
338: 'guinea pig, Cavia cobaya',
339: 'sorrel',
340: 'zebra',
341: 'hog, pig, grunter, squealer, Sus scrofa',
342: 'wild boar, boar, Sus scrofa',
343: 'warthog',
344: 'hippopotamus, hippo, river horse, Hippopotamus amphibius',
345: 'ox',
346: 'water buffalo, water ox, Asiatic buffalo, Bubalus bubalis',
347: 'bison',
348: 'ram, tup',
349: 'bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis',
350: 'ibex, Capra ibex',
351: 'hartebeest',
352: 'impala, Aepyceros melampus',
353: 'gazelle',
354: 'Arabian camel, dromedary, Camelus dromedarius',
355: 'llama',
356: 'weasel',
357: 'mink',
358: 'polecat, fitch, foulmart, foumart, Mustela putorius',
359: 'black-footed ferret, ferret, Mustela nigripes',
360: 'otter',
361: 'skunk, polecat, wood pussy',
362: 'badger',
363: 'armadillo',
364: 'three-toed sloth, ai, Bradypus tridactylus',
365: 'orangutan, orang, orangutang, Pongo pygmaeus',
366: 'gorilla, Gorilla gorilla',
367: 'chimpanzee, chimp, Pan troglodytes',
368: 'gibbon, Hylobates lar',
369: 'siamang, Hylobates syndactylus, Symphalangus syndactylus',
370: 'guenon, guenon monkey',
371: 'patas, hussar monkey, Erythrocebus patas',
372: 'baboon',
373: 'macaque',
374: 'langur',
375: 'colobus, colobus monkey',
376: 'proboscis monkey, Nasalis larvatus',
377: 'marmoset',
378: 'capuchin, ringtail, Cebus capucinus',
379: 'howler monkey, howler',
380: 'titi, titi monkey',
381: 'spider monkey, Ateles geoffroyi',
382: 'squirrel monkey, Saimiri sciureus',
383: 'Madagascar cat, ring-tailed lemur, Lemur catta',
384: 'indri, indris, Indri indri, Indri brevicaudatus',
385: 'Indian elephant, Elephas maximus',
386: 'African elephant, Loxodonta africana',
387: 'lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens',
388: 'giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca',
389: 'barracouta, snoek',
390: 'eel',
391: 'coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch',
392: 'rock beauty, Holocanthus tricolor',
393: 'anemone fish',
394: 'sturgeon',
395: 'gar, garfish, garpike, billfish, Lepisosteus osseus',
396: 'lionfish',
397: 'puffer, pufferfish, blowfish, globefish',
398: 'abacus',
399: 'abaya',
400: "academic gown, academic robe, judge's robe",
401: 'accordion, piano accordion, squeeze box',
402: 'acoustic guitar',
403: 'aircraft carrier, carrier, flattop, attack aircraft carrier',
404: 'airliner',
405: 'airship, dirigible',
406: 'altar',
407: 'ambulance',
408: 'amphibian, amphibious vehicle',
409: 'analog clock',
410: 'apiary, bee house',
411: 'apron',
412: 'ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin',
413: 'assault rifle, assault gun',
414: 'backpack, back pack, knapsack, packsack, rucksack, haversack',
415: 'bakery, bakeshop, bakehouse',
416: 'balance beam, beam',
417: 'balloon',
418: 'ballpoint, ballpoint pen, ballpen, Biro',
419: 'Band Aid',
420: 'banjo',
421: 'bannister, banister, balustrade, balusters, handrail',
422: 'barbell',
423: 'barber chair',
424: 'barbershop',
425: 'barn',
426: 'barometer',
427: 'barrel, cask',
428: 'barrow, garden cart, lawn cart, wheelbarrow',
429: 'baseball',
430: 'basketball',
431: 'bassinet',
432: 'bassoon',
433: 'bathing cap, swimming cap',
434: 'bath towel',
435: 'bathtub, bathing tub, bath, tub',
436: 'beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon',
437: 'beacon, lighthouse, beacon light, pharos',
438: 'beaker',
439: 'bearskin, busby, shako',
440: 'beer bottle',
441: 'beer glass',
442: 'bell cote, bell cot',
443: 'bib',
444: 'bicycle-built-for-two, tandem bicycle, tandem',
445: 'bikini, two-piece',
446: 'binder, ring-binder',
447: 'binoculars, field glasses, opera glasses',
448: 'birdhouse',
449: 'boathouse',
450: 'bobsled, bobsleigh, bob',
451: 'bolo tie, bolo, bola tie, bola',
452: 'bonnet, poke bonnet',
453: 'bookcase',
454: 'bookshop, bookstore, bookstall',
455: 'bottlecap',
456: 'bow',
457: 'bow tie, bow-tie, bowtie',
458: 'brass, memorial tablet, plaque',
459: 'brassiere, bra, bandeau',
460: 'breakwater, groin, groyne, mole, bulwark, seawall, jetty',
461: 'breastplate, aegis, egis',
462: 'broom',
463: 'bucket, pail',
464: 'buckle',
465: 'bulletproof vest',
466: 'bullet train, bullet',
467: 'butcher shop, meat market',
468: 'cab, hack, taxi, taxicab',
469: 'caldron, cauldron',
470: 'candle, taper, wax light',
471: 'cannon',
472: 'canoe',
473: 'can opener, tin opener',
474: 'cardigan',
475: 'car mirror',
476: 'carousel, carrousel, merry-go-round, roundabout, whirligig',
477: "carpenter's kit, tool kit",
478: 'carton',
479: 'car wheel',
480: 'cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM',
481: 'cassette',
482: 'cassette player',
483: 'castle',
484: 'catamaran',
485: 'CD player',
486: 'cello, violoncello',
487: 'cellular telephone, cellular phone, cellphone, cell, mobile phone',
488: 'chain',
489: 'chainlink fence',
490: 'chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour',
491: 'chain saw, chainsaw',
492: 'chest',
493: 'chiffonier, commode',
494: 'chime, bell, gong',
495: 'china cabinet, china closet',
496: 'Christmas stocking',
497: 'church, church building',
498: 'cinema, movie theater, movie theatre, movie house, picture palace',
499: 'cleaver, meat cleaver, chopper',
500: 'cliff dwelling',
501: 'cloak',
502: 'clog, geta, patten, sabot',
503: 'cocktail shaker',
504: 'coffee mug',
505: 'coffeepot',
506: 'coil, spiral, volute, whorl, helix',
507: 'combination lock',
508: 'computer keyboard, keypad',
509: 'confectionery, confectionary, candy store',
510: 'container ship, containership, container vessel',
511: 'convertible',
512: 'corkscrew, bottle screw',
513: 'cornet, horn, trumpet, trump',
514: 'cowboy boot',
515: 'cowboy hat, ten-gallon hat',
516: 'cradle',
517: 'crane',
518: 'crash helmet',
519: 'crate',
520: 'crib, cot',
521: 'Crock Pot',
522: 'croquet ball',
523: 'crutch',
524: 'cuirass',
525: 'dam, dike, dyke',
526: 'desk',
527: 'desktop computer',
528: 'dial telephone, dial phone',
529: 'diaper, nappy, napkin',
530: 'digital clock',
531: 'digital watch',
532: 'dining table, board',
533: 'dishrag, dishcloth',
534: 'dishwasher, dish washer, dishwashing machine',
535: 'disk brake, disc brake',
536: 'dock, dockage, docking facility',
537: 'dogsled, dog sled, dog sleigh',
538: 'dome',
539: 'doormat, welcome mat',
540: 'drilling platform, offshore rig',
541: 'drum, membranophone, tympan',
542: 'drumstick',
543: 'dumbbell',
544: 'Dutch oven',
545: 'electric fan, blower',
546: 'electric guitar',
547: 'electric locomotive',
548: 'entertainment center',
549: 'envelope',
550: 'espresso maker',
551: 'face powder',
552: 'feather boa, boa',
553: 'file, file cabinet, filing cabinet',
554: 'fireboat',
555: 'fire engine, fire truck',
556: 'fire screen, fireguard',
557: 'flagpole, flagstaff',
558: 'flute, transverse flute',
559: 'folding chair',
560: 'football helmet',
561: 'forklift',
562: 'fountain',
563: 'fountain pen',
564: 'four-poster',
565: 'freight car',
566: 'French horn, horn',
567: 'frying pan, frypan, skillet',
568: 'fur coat',
569: 'garbage truck, dustcart',
570: 'gasmask, respirator, gas helmet',
571: 'gas pump, gasoline pump, petrol pump, island dispenser',
572: 'goblet',
573: 'go-kart',
574: 'golf ball',
575: 'golfcart, golf cart',
576: 'gondola',
577: 'gong, tam-tam',
578: 'gown',
579: 'grand piano, grand',
580: 'greenhouse, nursery, glasshouse',
581: 'grille, radiator grille',
582: 'grocery store, grocery, food market, market',
583: 'guillotine',
584: 'hair slide',
585: 'hair spray',
586: 'half track',
587: 'hammer',
588: 'hamper',
589: 'hand blower, blow dryer, blow drier, hair dryer, hair drier',
590: 'hand-held computer, hand-held microcomputer',
591: 'handkerchief, hankie, hanky, hankey',
592: 'hard disc, hard disk, fixed disk',
593: 'harmonica, mouth organ, harp, mouth harp',
594: 'harp',
595: 'harvester, reaper',
596: 'hatchet',
597: 'holster',
598: 'home theater, home theatre',
599: 'honeycomb',
600: 'hook, claw',
601: 'hoopskirt, crinoline',
602: 'horizontal bar, high bar',
603: 'horse cart, horse-cart',
604: 'hourglass',
605: 'iPod',
606: 'iron, smoothing iron',
607: "jack-o'-lantern",
608: 'jean, blue jean, denim',
609: 'jeep, landrover',
610: 'jersey, T-shirt, tee shirt',
611: 'jigsaw puzzle',
612: 'jinrikisha, ricksha, rickshaw',
613: 'joystick',
614: 'kimono',
615: 'knee pad',
616: 'knot',
617: 'lab coat, laboratory coat',
618: 'ladle',
619: 'lampshade, lamp shade',
620: 'laptop, laptop computer',
621: 'lawn mower, mower',
622: 'lens cap, lens cover',
623: 'letter opener, paper knife, paperknife',
624: 'library',
625: 'lifeboat',
626: 'lighter, light, igniter, ignitor',
627: 'limousine, limo',
628: 'liner, ocean liner',
629: 'lipstick, lip rouge',
630: 'Loafer',
631: 'lotion',
632: 'loudspeaker, speaker, speaker unit, loudspeaker system, speaker system',
633: "loupe, jeweler's loupe",
634: 'lumbermill, sawmill',
635: 'magnetic compass',
636: 'mailbag, postbag',
637: 'mailbox, letter box',
638: 'maillot',
639: 'maillot, tank suit',
640: 'manhole cover',
641: 'maraca',
642: 'marimba, xylophone',
643: 'mask',
644: 'matchstick',
645: 'maypole',
646: 'maze, labyrinth',
647: 'measuring cup',
648: 'medicine chest, medicine cabinet',
649: 'megalith, megalithic structure',
650: 'microphone, mike',
651: 'microwave, microwave oven',
652: 'military uniform',
653: 'milk can',
654: 'minibus',
655: 'miniskirt, mini',
656: 'minivan',
657: 'missile',
658: 'mitten',
659: 'mixing bowl',
660: 'mobile home, manufactured home',
661: 'Model T',
662: 'modem',
663: 'monastery',
664: 'monitor',
665: 'moped',
666: 'mortar',
667: 'mortarboard',
668: 'mosque',
669: 'mosquito net',
670: 'motor scooter, scooter',
671: 'mountain bike, all-terrain bike, off-roader',
672: 'mountain tent',
673: 'mouse, computer mouse',
674: 'mousetrap',
675: 'moving van',
676: 'muzzle',
677: 'nail',
678: 'neck brace',
679: 'necklace',
680: 'nipple',
681: 'notebook, notebook computer',
682: 'obelisk',
683: 'oboe, hautboy, hautbois',
684: 'ocarina, sweet potato',
685: 'odometer, hodometer, mileometer, milometer',
686: 'oil filter',
687: 'organ, pipe organ',
688: 'oscilloscope, scope, cathode-ray oscilloscope, CRO',
689: 'overskirt',
690: 'oxcart',
691: 'oxygen mask',
692: 'packet',
693: 'paddle, boat paddle',
694: 'paddlewheel, paddle wheel',
695: 'padlock',
696: 'paintbrush',
697: "pajama, pyjama, pj's, jammies",
698: 'palace',
699: 'panpipe, pandean pipe, syrinx',
700: 'paper towel',
701: 'parachute, chute',
702: 'parallel bars, bars',
703: 'park bench',
704: 'parking meter',
705: 'passenger car, coach, carriage',
706: 'patio, terrace',
707: 'pay-phone, pay-station',
708: 'pedestal, plinth, footstall',
709: 'pencil box, pencil case',
710: 'pencil sharpener',
711: 'perfume, essence',
712: 'Petri dish',
713: 'photocopier',
714: 'pick, plectrum, plectron',
715: 'pickelhaube',
716: 'picket fence, paling',
717: 'pickup, pickup truck',
718: 'pier',
719: 'piggy bank, penny bank',
720: 'pill bottle',
721: 'pillow',
722: 'ping-pong ball',
723: 'pinwheel',
724: 'pirate, pirate ship',
725: 'pitcher, ewer',
726: "plane, carpenter's plane, woodworking plane",
727: 'planetarium',
728: 'plastic bag',
729: 'plate rack',
730: 'plow, plough',
731: "plunger, plumber's helper",
732: 'Polaroid camera, Polaroid Land camera',
733: 'pole',
734: 'police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria',
735: 'poncho',
736: 'pool table, billiard table, snooker table',
737: 'pop bottle, soda bottle',
738: 'pot, flowerpot',
739: "potter's wheel",
740: 'power drill',
741: 'prayer rug, prayer mat',
742: 'printer',
743: 'prison, prison house',
744: 'projectile, missile',
745: 'projector',
746: 'puck, hockey puck',
747: 'punching bag, punch bag, punching ball, punchball',
748: 'purse',
749: 'quill, quill pen',
750: 'quilt, comforter, comfort, puff',
751: 'racer, race car, racing car',
752: 'racket, racquet',
753: 'radiator',
754: 'radio, wireless',
755: 'radio telescope, radio reflector',
756: 'rain barrel',
757: 'recreational vehicle, RV, R.V.',
758: 'reel',
759: 'reflex camera',
760: 'refrigerator, icebox',
761: 'remote control, remote',
762: 'restaurant, eating house, eating place, eatery',
763: 'revolver, six-gun, six-shooter',
764: 'rifle',
765: 'rocking chair, rocker',
766: 'rotisserie',
767: 'rubber eraser, rubber, pencil eraser',
768: 'rugby ball',
769: 'rule, ruler',
770: 'running shoe',
771: 'safe',
772: 'safety pin',
773: 'saltshaker, salt shaker',
774: 'sandal',
775: 'sarong',
776: 'sax, saxophone',
777: 'scabbard',
778: 'scale, weighing machine',
779: 'school bus',
780: 'schooner',
781: 'scoreboard',
782: 'screen, CRT screen',
783: 'screw',
784: 'screwdriver',
785: 'seat belt, seatbelt',
786: 'sewing machine',
787: 'shield, buckler',
788: 'shoe shop, shoe-shop, shoe store',
789: 'shoji',
790: 'shopping basket',
791: 'shopping cart',
792: 'shovel',
793: 'shower cap',
794: 'shower curtain',
795: 'ski',
796: 'ski mask',
797: 'sleeping bag',
798: 'slide rule, slipstick',
799: 'sliding door',
800: 'slot, one-armed bandit',
801: 'snorkel',
802: 'snowmobile',
803: 'snowplow, snowplough',
804: 'soap dispenser',
805: 'soccer ball',
806: 'sock',
807: 'solar dish, solar collector, solar furnace',
808: 'sombrero',
809: 'soup bowl',
810: 'space bar',
811: 'space heater',
812: 'space shuttle',
813: 'spatula',
814: 'speedboat',
815: "spider web, spider's web",
816: 'spindle',
817: 'sports car, sport car',
818: 'spotlight, spot',
819: 'stage',
820: 'steam locomotive',
821: 'steel arch bridge',
822: 'steel drum',
823: 'stethoscope',
824: 'stole',
825: 'stone wall',
826: 'stopwatch, stop watch',
827: 'stove',
828: 'strainer',
829: 'streetcar, tram, tramcar, trolley, trolley car',
830: 'stretcher',
831: 'studio couch, day bed',
832: 'stupa, tope',
833: 'submarine, pigboat, sub, U-boat',
834: 'suit, suit of clothes',
835: 'sundial',
836: 'sunglass',
837: 'sunglasses, dark glasses, shades',
838: 'sunscreen, sunblock, sun blocker',
839: 'suspension bridge',
840: 'swab, swob, mop',
841: 'sweatshirt',
842: 'swimming trunks, bathing trunks',
843: 'swing',
844: 'switch, electric switch, electrical switch',
845: 'syringe',
846: 'table lamp',
847: 'tank, army tank, armored combat vehicle, armoured combat vehicle',
848: 'tape player',
849: 'teapot',
850: 'teddy, teddy bear',
851: 'television, television system',
852: 'tennis ball',
853: 'thatch, thatched roof',
854: 'theater curtain, theatre curtain',
855: 'thimble',
856: 'thresher, thrasher, threshing machine',
857: 'throne',
858: 'tile roof',
859: 'toaster',
860: 'tobacco shop, tobacconist shop, tobacconist',
861: 'toilet seat',
862: 'torch',
863: 'totem pole',
864: 'tow truck, tow car, wrecker',
865: 'toyshop',
866: 'tractor',
867: 'trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi',
868: 'tray',
869: 'trench coat',
870: 'tricycle, trike, velocipede',
871: 'trimaran',
872: 'tripod',
873: 'triumphal arch',
874: 'trolleybus, trolley coach, trackless trolley',
875: 'trombone',
876: 'tub, vat',
877: 'turnstile',
878: 'typewriter keyboard',
879: 'umbrella',
880: 'unicycle, monocycle',
881: 'upright, upright piano',
882: 'vacuum, vacuum cleaner',
883: 'vase',
884: 'vault',
885: 'velvet',
886: 'vending machine',
887: 'vestment',
888: 'viaduct',
889: 'violin, fiddle',
890: 'volleyball',
891: 'waffle iron',
892: 'wall clock',
893: 'wallet, billfold, notecase, pocketbook',
894: 'wardrobe, closet, press',
895: 'warplane, military plane',
896: 'washbasin, handbasin, washbowl, lavabo, wash-hand basin',
897: 'washer, automatic washer, washing machine',
898: 'water bottle',
899: 'water jug',
900: 'water tower',
901: 'whiskey jug',
902: 'whistle',
903: 'wig',
904: 'window screen',
905: 'window shade',
906: 'Windsor tie',
907: 'wine bottle',
908: 'wing',
909: 'wok',
910: 'wooden spoon',
911: 'wool, woolen, woollen',
912: 'worm fence, snake fence, snake-rail fence, Virginia fence',
913: 'wreck',
914: 'yawl',
915: 'yurt',
916: 'web site, website, internet site, site',
917: 'comic book',
918: 'crossword puzzle, crossword',
919: 'street sign',
920: 'traffic light, traffic signal, stoplight',
921: 'book jacket, dust cover, dust jacket, dust wrapper',
922: 'menu',
923: 'plate',
924: 'guacamole',
925: 'consomme',
926: 'hot pot, hotpot',
927: 'trifle',
928: 'ice cream, icecream',
929: 'ice lolly, lolly, lollipop, popsicle',
930: 'French loaf',
931: 'bagel, beigel',
932: 'pretzel',
933: 'cheeseburger',
934: 'hotdog, hot dog, red hot',
935: 'mashed potato',
936: 'head cabbage',
937: 'broccoli',
938: 'cauliflower',
939: 'zucchini, courgette',
940: 'spaghetti squash',
941: 'acorn squash',
942: 'butternut squash',
943: 'cucumber, cuke',
944: 'artichoke, globe artichoke',
945: 'bell pepper',
946: 'cardoon',
947: 'mushroom',
948: 'Granny Smith',
949: 'strawberry',
950: 'orange',
951: 'lemon',
952: 'fig',
953: 'pineapple, ananas',
954: 'banana',
955: 'jackfruit, jak, jack',
956: 'custard apple',
957: 'pomegranate',
958: 'hay',
959: 'carbonara',
960: 'chocolate sauce, chocolate syrup',
961: 'dough',
962: 'meat loaf, meatloaf',
963: 'pizza, pizza pie',
964: 'potpie',
965: 'burrito',
966: 'red wine',
967: 'espresso',
968: 'cup',
969: 'eggnog',
970: 'alp',
971: 'bubble',
972: 'cliff, drop, drop-off',
973: 'coral reef',
974: 'geyser',
975: 'lakeside, lakeshore',
976: 'promontory, headland, head, foreland',
977: 'sandbar, sand bar',
978: 'seashore, coast, seacoast, sea-coast',
979: 'valley, vale',
980: 'volcano',
981: 'ballplayer, baseball player',
982: 'groom, bridegroom',
983: 'scuba diver',
984: 'rapeseed',
985: 'daisy',
986: "yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum",
987: 'corn',
988: 'acorn',
989: 'hip, rose hip, rosehip',
990: 'buckeye, horse chestnut, conker',
991: 'coral fungus',
992: 'agaric',
993: 'gyromitra',
994: 'stinkhorn, carrion fungus',
995: 'earthstar',
996: 'hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa',
997: 'bolete',
998: 'ear, spike, capitulum',
999: 'toilet tissue, toilet paper, bathroom tissue'}
Sixth checkpoint: the raw logits for our single image, as a length-1000 vector. The argmax of this vector is the model's top-1 prediction. They are uncalibrated — softmax later will turn them into probabilities.
logits[0]Out[35]:
tensor([-1.1288e+00, -5.5756e-01, -1.5567e+00, -1.3101e+00, -5.8843e-01,
-1.5344e+00, -1.7410e+00, -1.8962e+00, -1.8121e+00, -2.0072e+00,
-1.5723e+00, -1.7178e+00, -1.8666e+00, -1.7056e+00, -1.6707e+00,
-9.8119e-01, -6.7786e-01, -1.7026e+00, -1.6719e+00, -1.2583e+00,
-1.3638e+00, -1.1015e+00, -1.6210e+00, -1.5375e+00, -1.3533e+00,
-1.2028e+00, -8.9297e-02, -3.4615e-01, -6.2297e-01, -7.7859e-01,
-1.7597e+00, -1.0814e+00, -1.7800e+00, -2.0764e+00, -8.3730e-01,
-1.7238e+00, -2.6275e-01, -1.8172e+00, -9.1032e-01, -1.1190e+00,
-1.0625e+00, -9.2219e-01, -1.3375e+00, -1.3699e+00, -2.3913e-01,
-1.5224e+00, -1.1827e+00, -1.1833e+00, -1.0774e+00, -1.7358e+00,
-1.0469e+00, 8.9774e-02, -4.2469e-01, -1.2435e+00, -1.1599e+00,
-1.4301e+00, -1.0391e+00, -1.3216e+00, -1.4284e+00, -1.0975e+00,
-1.3484e+00, -1.0535e+00, -1.6283e+00, -1.3372e+00, -1.6257e+00,
-1.4261e+00, -1.8167e+00, -1.8975e+00, -8.3239e-01, -3.9849e-02,
-1.7841e+00, -7.3182e-01, -7.5883e-01, -5.2622e-01, -1.0596e+00,
-8.3641e-01, -1.6568e+00, -1.5369e+00, -1.3250e+00, -5.1549e-01,
-1.3927e+00, -2.1316e+00, -2.8735e+00, -2.4314e+00, -1.4067e+00,
-2.6360e+00, -2.8996e+00, 9.7903e-01, -9.8650e-01, -9.2010e-01,
-1.6485e+00, -2.1602e+00, -1.8970e+00, -2.3107e+00, -8.5293e-01,
-1.1307e+00, -2.0294e+00, -1.3849e+00, -8.5414e-01, -1.1081e+00,
-7.7380e-01, -9.0096e-01, -5.7366e-01, -1.7009e-01, -8.7846e-01,
-1.0487e+00, -1.5573e-01, -2.0009e-01, -1.0339e+00, -9.1640e-01,
-1.5770e+00, -1.2183e+00, -4.6172e-01, -1.1566e+00, 6.1680e-02,
-8.3394e-01, -1.7371e+00, 9.0266e-02, 6.6352e-01, -9.2439e-01,
-2.0220e+00, -8.9493e-01, -1.5144e-01, -1.8428e+00, -6.9276e-01,
-1.5316e+00, -3.5799e-01, -1.8749e+00, -1.4881e+00, -1.6213e+00,
-1.3232e+00, -1.7723e+00, -7.9047e-01, -1.9146e+00, -1.1945e+00,
-6.3094e-01, -1.7093e+00, -1.6797e-01, -2.1087e+00, -2.3179e+00,
-1.9785e+00, -1.4969e+00, -1.8115e+00, -2.1166e+00, -1.6588e+00,
-1.0613e+00, -5.0492e-01, -2.1767e+00, -1.6234e+00, -1.1036e+00,
-1.4370e+00, 8.1056e-01, -1.0474e+00, -4.1610e-01, -8.2913e-01,
-9.5349e-01, -9.5689e-01, -2.5514e-01, -2.5555e-01, -1.0438e+00,
-8.7707e-01, -1.4423e+00, -1.6613e+00, -1.3774e+00, -4.5691e-01,
-1.8336e+00, -2.2308e+00, -1.8402e+00, -6.0270e-01, -1.7237e+00,
-1.6384e+00, -4.4052e-01, -1.8071e+00, -1.5352e+00, -1.3887e+00,
-2.2053e+00, -1.1253e+00, -1.3287e+00, -3.6203e-01, -6.7451e-01,
-2.0837e+00, -1.0458e+00, -1.5430e+00, -5.7658e-01, -1.1472e+00,
2.0212e-01, -2.6233e-01, -7.8850e-01, -1.4344e+00, -6.7024e-01,
-9.5009e-01, -2.2901e+00, -1.6378e+00, -5.6170e-01, -1.5876e+00,
5.0793e-02, -1.3269e+00, -1.8811e+00, -1.1867e+00, -8.0472e-01,
-1.2382e+00, -5.5780e-02, -1.4564e+00, -1.3538e+00, -1.5029e-01,
-7.2923e-01, -8.1718e-01, -1.2432e+00, -1.9306e+00, -8.0074e-01,
-7.8810e-01, -5.5854e-01, -1.3028e+00, -1.3805e+00, -1.0876e+00,
-1.2772e+00, -1.7482e+00, -1.0466e+00, -1.7972e+00, -1.3454e+00,
-2.7756e-01, -1.0651e+00, -1.3511e+00, -1.0626e+00, -1.3594e+00,
-1.8190e+00, -1.3949e+00, -2.1168e+00, -1.5986e-01, -1.9322e+00,
-1.9654e+00, -2.2672e+00, -1.2830e+00, -2.0853e+00, -2.9499e+00,
-1.3882e+00, -6.3462e-01, -7.2214e-01, -9.9601e-01, -1.3956e+00,
-1.2745e+00, -1.3796e+00, -1.3912e+00, -9.7312e-01, -1.0223e+00,
-5.4940e-01, -8.0601e-01, -1.0212e+00, -8.6111e-01, -1.1427e+00,
-1.3147e+00, -5.0085e-01, -6.9131e-01, -8.1273e-01, 9.6982e-02,
-4.9907e-01, -6.5999e-01, -1.3624e+00, -5.2018e-01, -1.5573e+00,
-1.3135e+00, -2.3948e+00, -1.3010e+00, -5.0329e-01, -9.7687e-01,
6.8204e-02, -8.9449e-02, -9.4618e-01, -8.8762e-01, -1.6033e+00,
-1.8124e+00, -2.1654e+00, -2.1438e+00, -9.1164e-01, -1.8081e+00,
-2.6824e+00, -2.2461e+00, -2.8404e+00, -2.2389e+00, -2.3717e+00,
-2.7880e+00, -1.7697e+00, -1.7985e+00, -1.3684e+00, 1.0784e-01,
3.8752e-01, -2.0528e+00, -1.5288e+00, -1.5552e+00, -1.8982e+00,
-1.4148e+00, -1.9727e+00, -1.4251e+00, -1.4619e+00, -1.1224e+00,
-1.6169e+00, -1.6196e+00, -1.4527e+00, -1.3702e+00, -1.1338e+00,
-9.2544e-01, -1.4009e+00, -6.4955e-01, -1.5494e+00, -1.9337e+00,
-1.2910e+00, 4.8402e-01, -1.0485e+00, -1.8810e+00, -2.0764e+00,
-9.0217e-01, -1.5040e+00, -1.5439e+00, -1.9934e+00, -1.9208e-01,
-2.2644e+00, -2.1138e+00, -1.7672e+00, -1.0737e+00, -1.4061e+00,
-1.5477e+00, -1.6711e+00, -1.0581e+00, -1.5613e+00, -7.4515e-01,
-8.8589e-01, -7.3351e-01, -6.6812e-02, -7.4479e-02, -4.7825e-01,
-2.5402e+00, -2.9542e+00, -2.0981e+00, -1.1647e+00, -1.1550e+00,
-1.2923e+00, -2.2555e+00, -1.3204e+00, -1.9082e+00, -1.1163e+00,
-1.1351e+00, -2.0877e+00, -1.9385e+00, -2.0916e+00, -1.7080e+00,
-1.8356e+00, -1.5045e+00, -1.5071e+00, -2.1718e+00, -1.5874e+00,
-1.7591e-01, -1.1006e+00, -1.3475e+00, -2.5205e-01, -1.1786e+00,
-1.8891e+00, -1.0950e+00, -1.5001e+00, -5.7034e-01, -6.3320e-01,
-1.0726e+00, -1.5441e+00, -1.1395e+00, 1.4060e-01, -6.0951e-01,
-9.7084e-01, -1.5965e+00, -1.6291e+00, -2.2506e+00, -1.8121e+00,
-1.2935e+00, -1.9496e+00, -2.2894e+00, -1.7002e+00, -2.0452e+00,
-1.3958e+00, -7.1474e-01, -1.8174e+00, -1.7262e+00, -1.7804e+00,
-7.1823e-01, -1.5532e+00, -1.5278e+00, -1.6071e+00, -1.4682e+00,
-1.8455e+00, -2.0240e+00, -9.4857e-01, -1.2190e+00, -9.1378e-01,
-1.1949e+00, -6.0205e-01, -1.6749e+00, -1.7124e+00, -4.3164e-01,
-9.6829e-01, -1.0952e+00, -7.9104e-01, 2.3104e+00, 1.0714e+00,
1.0450e+00, 1.4213e+00, 1.0278e+00, -6.5385e-01, -3.5740e-01,
-6.7773e-01, 3.5781e-01, -1.0136e+00, -1.7345e+00, 3.0628e+00,
-1.7585e+00, -2.2044e-01, 3.5266e+00, -2.1260e-01, 1.4028e+00,
4.5782e+00, -7.8668e-01, 5.2189e-01, 2.8467e+00, 2.9685e+00,
3.9798e-01, 1.7613e-01, -8.7818e-01, 7.2730e-01, 1.4216e+00,
-1.3559e+00, 9.1536e-01, 1.9479e+00, 4.9609e-02, 7.9663e-01,
-1.0046e+00, 4.4092e-01, 9.0080e-01, 1.2989e+00, -1.0287e+00,
6.5570e-01, -1.0619e+00, -6.2818e-01, 1.9653e+00, 1.4845e+00,
1.1694e+00, 3.6230e+00, 4.2247e-01, 2.3146e-01, -9.8158e-01,
-1.4154e+00, 1.5514e+00, 1.1212e+00, -7.9226e-01, -5.8781e-01,
2.0116e-01, 2.8734e-01, 2.8798e-01, 5.3292e-01, 3.6066e+00,
1.2094e+00, -5.1846e-01, 1.5131e+00, 8.4088e-01, -9.2419e-02,
-5.0970e-01, 1.3946e+00, -1.4535e-02, 3.0267e+00, 1.9495e+00,
7.7145e-01, -2.9291e-01, 9.9016e-02, 4.7988e-01, 1.3111e+00,
3.1897e+00, -5.4317e-01, -1.3991e+00, 2.8718e+00, -1.9368e-01,
4.3121e-01, -5.2684e-02, 6.9063e-01, 1.6148e+00, -4.4517e-01,
2.7687e+00, 2.8848e+00, 2.1599e+00, -5.1245e-01, -1.1066e+00,
2.8624e+00, -3.7812e-01, 6.3851e+00, 2.7377e-01, -8.1250e-01,
3.1924e-01, 1.4829e-01, 3.7285e-01, -1.2799e-01, -1.0288e+00,
7.5215e-01, 4.5020e-01, -2.4231e-01, 2.2576e+00, 3.6101e+00,
-2.3716e+00, 9.9774e-01, 2.5722e+00, 1.9629e+00, 1.0951e+01,
5.7500e+00, -2.2427e-01, 1.7355e+00, 2.3874e+00, 1.1408e+00,
-1.8416e+00, -5.5941e-01, 1.7484e+00, -7.2259e-01, 1.7906e+00,
1.8903e+00, -1.9706e-01, -1.6067e-03, 1.5667e+00, 1.3664e+00,
7.5336e-01, 2.6782e+00, 5.7073e-02, 1.4663e+00, 2.1284e-01,
-1.1474e+00, 3.8432e+00, 2.1757e+00, 2.8087e+00, 1.1324e+00,
3.5591e+00, 1.9089e+00, 2.4432e+00, -4.5529e-01, 1.0408e+00,
-5.0486e-01, -3.3511e-01, -1.2184e+00, -1.1848e+00, 1.5616e+00,
-1.2250e+00, 3.7889e-01, 1.7945e+00, -9.9117e-02, -7.6152e-03,
1.9710e-01, 8.2536e-01, -1.1791e+00, 5.1105e-01, 1.8944e+00,
4.7684e+00, 2.4500e+00, -3.4019e-01, 3.6311e-01, -6.6713e-01,
-3.0069e-01, 4.5596e-01, -2.8358e-01, 2.6556e-01, 1.8438e+00,
4.0064e-01, -1.5674e+00, 5.7787e-03, 1.2242e+00, 1.2593e-01,
-5.2277e-01, -9.8615e-01, -1.0612e+00, 1.7508e+00, -8.5462e-01,
1.6774e+00, 1.3741e-01, 1.6946e+00, -1.1201e+00, 6.7398e-01,
-1.1908e+00, -1.0301e+00, -3.7273e-01, -7.4300e-01, 8.1450e-02,
8.1920e-01, -4.9004e-01, 1.9640e+00, 2.2390e+00, 6.1085e-01,
2.5948e+00, -8.5221e-01, 1.7476e+00, 1.5593e+00, 6.3670e-01,
4.7895e+00, 3.2253e+00, 3.3895e+00, 3.1997e+00, 2.6604e-01,
-1.9937e+00, 1.5900e+00, 2.7293e-01, 1.4374e+00, -1.1999e-01,
7.7782e-01, 1.5181e-02, 1.1509e-01, -7.6229e-01, 2.3637e+00,
4.1866e+00, 5.0602e-01, 3.6513e-01, 2.6414e+00, -1.0411e+00,
1.9441e+00, 2.2214e+00, 1.2579e+00, 1.5240e+00, -1.1258e-01,
1.5084e-01, -1.4185e-01, 1.2559e+00, 1.1839e+00, 1.3838e+00,
4.6340e+00, -9.0865e-01, 4.0835e+00, 2.4878e+00, 2.5173e+00,
-1.9493e+00, 3.4721e+00, 4.0724e-01, -1.4598e+00, 2.7559e+00,
1.6428e+00, 2.1481e+00, 1.3823e+00, 1.8370e+00, -4.5120e-01,
1.8123e+00, 2.4906e+00, 1.5409e+00, -1.0159e+00, -1.8618e+00,
5.8097e-01, -2.8717e-01, 5.1455e-01, 8.0545e-01, 3.6588e+00,
1.3412e-01, 9.9009e-01, 2.8569e+00, 1.9259e+00, -7.2916e-01,
1.0044e+00, 1.9262e+00, 1.7313e-02, 2.1786e+00, -3.9642e-01,
2.7504e-01, -1.1991e-01, -6.7182e-01, 1.2963e+00, 1.9883e-01,
-1.0035e+00, -6.5229e-01, 4.1433e+00, -9.4811e-01, 1.8351e+00,
-9.4409e-01, 1.7877e+00, 1.5015e+00, -6.3148e-01, -4.0982e-01,
-2.3635e-02, -6.8984e-01, -1.3123e+00, 4.1120e+00, 2.7246e+00,
1.6324e-02, -5.0407e-01, 5.7172e-01, 2.0147e+00, -6.3177e-01,
6.3341e-01, 5.6590e+00, 6.1334e-02, 2.1875e-01, 1.4905e+00,
4.1078e-01, 3.2662e+00, -4.3793e-01, 1.7184e+00, -9.2182e-01,
-8.5871e-01, -4.3197e-01, 5.0464e+00, -2.4518e-01, -1.4605e+00,
9.4362e-01, 2.9023e+00, 6.4944e-01, -1.4676e-02, 3.1429e-01,
4.4461e+00, -7.1109e-01, -4.0138e-01, 2.0017e+00, 1.5661e+00,
-7.6096e-01, 1.8566e+00, 3.0987e+00, 5.9708e-01, 2.0955e+00,
2.4421e+00, 1.8260e+00, 7.0799e-01, 1.0659e+00, 1.4124e+00,
3.7122e-01, -6.4199e-01, -1.3770e-01, 6.3672e-01, 2.3187e+00,
2.9253e+00, 1.2384e+00, 2.1925e+00, 1.2610e+00, -1.1169e+00,
2.5535e+00, 9.9501e-01, 1.8191e-01, 2.1627e+00, 1.7116e+00,
-2.3068e+00, 2.1134e+00, 2.8885e+00, 1.1332e+00, 1.2188e-02,
-8.5331e-01, 7.1743e-01, 3.8555e-01, 2.5166e+00, 2.2196e+00,
2.9442e-02, -4.7403e-02, 1.8368e+00, 6.8140e-01, -1.4280e-01,
2.1131e+00, 8.8801e-01, 1.1169e+00, 1.8357e+00, 1.4696e+00,
1.2638e+00, -6.8029e-01, -3.2958e-01, 5.8167e-01, 3.2046e+00,
-4.1551e-01, 5.4914e-01, -1.3841e+00, 2.4618e-01, 1.8262e+00,
2.0605e+00, 4.1062e+00, 5.1896e+00, 1.3421e+00, 5.5525e-01,
1.3706e+00, 9.9714e-01, 3.2196e+00, -7.3169e-01, 1.5970e+00,
1.9575e+00, 6.7052e-01, 1.4960e+00, 3.5359e+00, 6.5257e-01,
-1.9448e+00, -1.6082e-01, 4.1609e-01, 2.1405e+00, -7.4468e-01,
-1.6838e+00, 4.4351e-02, 2.3610e+00, 6.3643e-01, 1.3752e+00,
9.2887e-01, 1.4374e+00, -2.5057e-01, 1.6678e+00, 1.2721e+00,
1.0991e+00, 3.6516e-01, 6.2042e-01, 1.2471e+00, 1.0038e+00,
-2.4707e-01, 2.3911e+00, 1.2827e+00, 2.3981e+00, 8.2473e-01,
8.7118e-01, -3.0461e-01, -7.4650e-01, -8.7416e-01, 9.8581e-01,
-1.3365e+00, 5.1204e-01, 1.1578e-01, 7.5739e-01, 2.7037e+00,
3.8529e+00, 1.9886e+00, 4.9845e-02, 1.6810e+00, -1.6452e+00,
-9.5338e-01, 8.3930e-02, -1.4824e-01, 7.9138e-01, -1.3337e-01,
-1.6955e+00, -3.7787e-01, 8.2839e-01, 2.1416e+00, 1.1960e+00,
-1.0223e+00, 2.1449e+00, 2.3191e+00, 2.2857e+00, 9.6810e-01,
-2.0137e-01, 2.7947e+00, -1.5959e+00, -1.7917e+00, 1.2072e+00,
-1.5412e-01, 3.0145e+00, 2.2277e+00, 2.0193e+00, -3.5791e-01,
1.4194e+00, 1.3186e+00, 1.4767e-01, -5.0973e-02, 1.5013e+00,
1.3330e+00, 1.3218e+00, -9.6431e-01, 3.3077e+00, 4.4780e+00,
3.0723e+00, 3.0282e+00, 3.6395e-01, -1.1868e-01, 1.9100e+00,
1.7480e+00, -1.5975e+00, 1.4965e-01, -5.2344e-02, 3.8128e+00,
6.2292e+00, 7.9314e-01, 2.2287e-01, 7.8432e-02, -8.6820e-01,
1.5580e+00, -2.0533e+00, 1.9541e-01, 4.1310e+00, 1.1829e+00,
7.2605e-02, -1.3189e+00, 1.6491e+00, -3.5762e-01, -4.5466e-01,
-8.1512e-01, 7.1880e-01, 1.0953e+00, 1.3817e+00, 5.3443e-01,
-4.9825e-01, 1.0563e+00, 7.2192e-01, 4.3681e-01, 8.1760e-02,
1.2844e+00, 1.5841e+00, -1.2133e+00, -5.1414e-01, 7.7775e-01,
-9.7600e-01, 2.0780e+00, 1.5780e+00, 2.6456e+00, 1.6863e-01,
-1.1976e+00, 1.3533e+00, 3.3104e-01, 3.1936e+00, 3.7815e+00,
-4.4944e-01, 2.2744e+00, 8.3756e-01, 8.1147e-01, 1.2852e+00,
1.0977e+00, 4.1722e-01, 1.2004e+00, -6.0595e-01, -3.4656e-01,
2.1500e+00, 1.1494e+00, -8.6472e-01, -2.1229e+00, -1.3426e+00,
-1.4216e+00, 3.6926e+00, 3.5461e+00, 6.3547e+00, 3.6496e+00,
1.2884e+00, 4.3123e+00, 3.8509e+00, 2.2235e+00, 7.3406e-01,
2.4890e+00, 1.2146e+00, 6.8909e-01, 1.3616e+00, 1.4953e+00,
1.1625e+00, 3.2800e+00, 3.0783e+00, 4.8876e-01, 2.7620e+00,
-9.4409e-01, 4.5483e-02, -4.6922e-01, -6.8508e-01, -7.1216e-01,
-9.2057e-01, -2.9971e-01, -1.2979e-02, 1.8758e-01, -1.0300e+00,
5.9650e-01, -1.7380e+00, -5.4800e-01, -1.4288e-02, 9.5569e-01,
1.6889e-01, 1.6327e-01, 3.0140e-01, 2.7652e-01, 9.7939e-01,
-3.0470e-01, 8.9789e-01, 3.2968e-01, -2.0730e+00, -1.0088e+00,
2.3379e+00, 7.1313e-01, -1.2382e+00, 7.7513e-01, 4.1961e-01,
1.3884e+00, 3.1139e+00, 7.7496e+00, 9.0270e+00, 2.8993e+00,
-8.0959e-01, 5.8377e-01, -2.2734e+00, -1.0872e+00, -8.1821e-01,
6.3957e-01, -1.7169e+00, -7.7390e-01, -1.2564e+00, -6.0100e-01,
-4.2064e-01, -4.4467e-01, 1.9411e-01, -1.6461e+00, -1.5729e+00,
-4.3694e-01, -3.2641e-01, -8.0312e-02, -7.7821e-01, -3.3205e-01,
3.9647e-01, -8.4661e-01, -3.9440e-01, -7.8064e-01, -5.5400e-01,
-9.1702e-01, 4.3964e-02, 1.2689e-01, 5.3765e-01, 3.3942e+00],
grad_fn=<SelectBackward0>)
Now we need two more libraries: torch for the softmax and topk, and pandas for the final summary table. They were not imported earlier because nothing in the patch-cutting section needed them.
import torch
import pandas as pdLogits are uncalibrated scores. To compare them across the 1,000 ImageNet classes, run them through a softmax along the class dimension. The output is a probability distribution — values in [0, 1] that sum to 1.
probs = torch.nn.functional.softmax(logits, dim=-1)torch.topk(probs, 5) returns the five highest probabilities and their class indices. We move them off the GPU and into NumPy so the rest of the post can treat them as ordinary arrays.
top_5_probs and top_5_labels come out of torch.topk as PyTorch tensors on the same device as the model. Calling .detach().cpu().numpy() detaches them from the computation graph, moves them to CPU memory, and converts to NumPy arrays — three small steps that make the numbers easy to print and slot into a DataFrame.
top_5_probs, top_5_labels = torch.topk(probs, 5)
top_5_probs = top_5_probs.detach().cpu().numpy()
top_5_labels = top_5_labels.detach().cpu().numpy()Inspect the top-5 probabilities. The first value is by far the largest; the rest drop off quickly, which is what a confident prediction looks like.
top_5_probsOut[39]:
array([[0.7677374 , 0.11205141, 0.03123471, 0.00798096, 0.00774207]],
dtype=float32)
Inspect the top-5 class indices. Each one is an integer in [0, 999]; the next step turns them into names.
top_5_labelsOut[40]:
array([[504, 968, 967, 487, 918]])
classifier.config.id2label is a dict that maps an ImageNet class index (0–999) to its human-readable name. We capture it explicitly because the loop below needs it on every iteration. Saving it in a local variable also documents the intent: "I am about to translate indices into labels."
id2label = classifier.config.id2label
id2labelOut[41]:
{0: 'tench, Tinca tinca',
1: 'goldfish, Carassius auratus',
2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
3: 'tiger shark, Galeocerdo cuvieri',
4: 'hammerhead, hammerhead shark',
5: 'electric ray, crampfish, numbfish, torpedo',
6: 'stingray',
7: 'cock',
8: 'hen',
9: 'ostrich, Struthio camelus',
10: 'brambling, Fringilla montifringilla',
11: 'goldfinch, Carduelis carduelis',
12: 'house finch, linnet, Carpodacus mexicanus',
13: 'junco, snowbird',
14: 'indigo bunting, indigo finch, indigo bird, Passerina cyanea',
15: 'robin, American robin, Turdus migratorius',
16: 'bulbul',
17: 'jay',
18: 'magpie',
19: 'chickadee',
20: 'water ouzel, dipper',
21: 'kite',
22: 'bald eagle, American eagle, Haliaeetus leucocephalus',
23: 'vulture',
24: 'great grey owl, great gray owl, Strix nebulosa',
25: 'European fire salamander, Salamandra salamandra',
26: 'common newt, Triturus vulgaris',
27: 'eft',
28: 'spotted salamander, Ambystoma maculatum',
29: 'axolotl, mud puppy, Ambystoma mexicanum',
30: 'bullfrog, Rana catesbeiana',
31: 'tree frog, tree-frog',
32: 'tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui',
33: 'loggerhead, loggerhead turtle, Caretta caretta',
34: 'leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea',
35: 'mud turtle',
36: 'terrapin',
37: 'box turtle, box tortoise',
38: 'banded gecko',
39: 'common iguana, iguana, Iguana iguana',
40: 'American chameleon, anole, Anolis carolinensis',
41: 'whiptail, whiptail lizard',
42: 'agama',
43: 'frilled lizard, Chlamydosaurus kingi',
44: 'alligator lizard',
45: 'Gila monster, Heloderma suspectum',
46: 'green lizard, Lacerta viridis',
47: 'African chameleon, Chamaeleo chamaeleon',
48: 'Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis',
49: 'African crocodile, Nile crocodile, Crocodylus niloticus',
50: 'American alligator, Alligator mississipiensis',
51: 'triceratops',
52: 'thunder snake, worm snake, Carphophis amoenus',
53: 'ringneck snake, ring-necked snake, ring snake',
54: 'hognose snake, puff adder, sand viper',
55: 'green snake, grass snake',
56: 'king snake, kingsnake',
57: 'garter snake, grass snake',
58: 'water snake',
59: 'vine snake',
60: 'night snake, Hypsiglena torquata',
61: 'boa constrictor, Constrictor constrictor',
62: 'rock python, rock snake, Python sebae',
63: 'Indian cobra, Naja naja',
64: 'green mamba',
65: 'sea snake',
66: 'horned viper, cerastes, sand viper, horned asp, Cerastes cornutus',
67: 'diamondback, diamondback rattlesnake, Crotalus adamanteus',
68: 'sidewinder, horned rattlesnake, Crotalus cerastes',
69: 'trilobite',
70: 'harvestman, daddy longlegs, Phalangium opilio',
71: 'scorpion',
72: 'black and gold garden spider, Argiope aurantia',
73: 'barn spider, Araneus cavaticus',
74: 'garden spider, Aranea diademata',
75: 'black widow, Latrodectus mactans',
76: 'tarantula',
77: 'wolf spider, hunting spider',
78: 'tick',
79: 'centipede',
80: 'black grouse',
81: 'ptarmigan',
82: 'ruffed grouse, partridge, Bonasa umbellus',
83: 'prairie chicken, prairie grouse, prairie fowl',
84: 'peacock',
85: 'quail',
86: 'partridge',
87: 'African grey, African gray, Psittacus erithacus',
88: 'macaw',
89: 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita',
90: 'lorikeet',
91: 'coucal',
92: 'bee eater',
93: 'hornbill',
94: 'hummingbird',
95: 'jacamar',
96: 'toucan',
97: 'drake',
98: 'red-breasted merganser, Mergus serrator',
99: 'goose',
100: 'black swan, Cygnus atratus',
101: 'tusker',
102: 'echidna, spiny anteater, anteater',
103: 'platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus',
104: 'wallaby, brush kangaroo',
105: 'koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus',
106: 'wombat',
107: 'jellyfish',
108: 'sea anemone, anemone',
109: 'brain coral',
110: 'flatworm, platyhelminth',
111: 'nematode, nematode worm, roundworm',
112: 'conch',
113: 'snail',
114: 'slug',
115: 'sea slug, nudibranch',
116: 'chiton, coat-of-mail shell, sea cradle, polyplacophore',
117: 'chambered nautilus, pearly nautilus, nautilus',
118: 'Dungeness crab, Cancer magister',
119: 'rock crab, Cancer irroratus',
120: 'fiddler crab',
121: 'king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica',
122: 'American lobster, Northern lobster, Maine lobster, Homarus americanus',
123: 'spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish',
124: 'crayfish, crawfish, crawdad, crawdaddy',
125: 'hermit crab',
126: 'isopod',
127: 'white stork, Ciconia ciconia',
128: 'black stork, Ciconia nigra',
129: 'spoonbill',
130: 'flamingo',
131: 'little blue heron, Egretta caerulea',
132: 'American egret, great white heron, Egretta albus',
133: 'bittern',
134: 'crane',
135: 'limpkin, Aramus pictus',
136: 'European gallinule, Porphyrio porphyrio',
137: 'American coot, marsh hen, mud hen, water hen, Fulica americana',
138: 'bustard',
139: 'ruddy turnstone, Arenaria interpres',
140: 'red-backed sandpiper, dunlin, Erolia alpina',
141: 'redshank, Tringa totanus',
142: 'dowitcher',
143: 'oystercatcher, oyster catcher',
144: 'pelican',
145: 'king penguin, Aptenodytes patagonica',
146: 'albatross, mollymawk',
147: 'grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus',
148: 'killer whale, killer, orca, grampus, sea wolf, Orcinus orca',
149: 'dugong, Dugong dugon',
150: 'sea lion',
151: 'Chihuahua',
152: 'Japanese spaniel',
153: 'Maltese dog, Maltese terrier, Maltese',
154: 'Pekinese, Pekingese, Peke',
155: 'Shih-Tzu',
156: 'Blenheim spaniel',
157: 'papillon',
158: 'toy terrier',
159: 'Rhodesian ridgeback',
160: 'Afghan hound, Afghan',
161: 'basset, basset hound',
162: 'beagle',
163: 'bloodhound, sleuthhound',
164: 'bluetick',
165: 'black-and-tan coonhound',
166: 'Walker hound, Walker foxhound',
167: 'English foxhound',
168: 'redbone',
169: 'borzoi, Russian wolfhound',
170: 'Irish wolfhound',
171: 'Italian greyhound',
172: 'whippet',
173: 'Ibizan hound, Ibizan Podenco',
174: 'Norwegian elkhound, elkhound',
175: 'otterhound, otter hound',
176: 'Saluki, gazelle hound',
177: 'Scottish deerhound, deerhound',
178: 'Weimaraner',
179: 'Staffordshire bullterrier, Staffordshire bull terrier',
180: 'American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier',
181: 'Bedlington terrier',
182: 'Border terrier',
183: 'Kerry blue terrier',
184: 'Irish terrier',
185: 'Norfolk terrier',
186: 'Norwich terrier',
187: 'Yorkshire terrier',
188: 'wire-haired fox terrier',
189: 'Lakeland terrier',
190: 'Sealyham terrier, Sealyham',
191: 'Airedale, Airedale terrier',
192: 'cairn, cairn terrier',
193: 'Australian terrier',
194: 'Dandie Dinmont, Dandie Dinmont terrier',
195: 'Boston bull, Boston terrier',
196: 'miniature schnauzer',
197: 'giant schnauzer',
198: 'standard schnauzer',
199: 'Scotch terrier, Scottish terrier, Scottie',
200: 'Tibetan terrier, chrysanthemum dog',
201: 'silky terrier, Sydney silky',
202: 'soft-coated wheaten terrier',
203: 'West Highland white terrier',
204: 'Lhasa, Lhasa apso',
205: 'flat-coated retriever',
206: 'curly-coated retriever',
207: 'golden retriever',
208: 'Labrador retriever',
209: 'Chesapeake Bay retriever',
210: 'German short-haired pointer',
211: 'vizsla, Hungarian pointer',
212: 'English setter',
213: 'Irish setter, red setter',
214: 'Gordon setter',
215: 'Brittany spaniel',
216: 'clumber, clumber spaniel',
217: 'English springer, English springer spaniel',
218: 'Welsh springer spaniel',
219: 'cocker spaniel, English cocker spaniel, cocker',
220: 'Sussex spaniel',
221: 'Irish water spaniel',
222: 'kuvasz',
223: 'schipperke',
224: 'groenendael',
225: 'malinois',
226: 'briard',
227: 'kelpie',
228: 'komondor',
229: 'Old English sheepdog, bobtail',
230: 'Shetland sheepdog, Shetland sheep dog, Shetland',
231: 'collie',
232: 'Border collie',
233: 'Bouvier des Flandres, Bouviers des Flandres',
234: 'Rottweiler',
235: 'German shepherd, German shepherd dog, German police dog, alsatian',
236: 'Doberman, Doberman pinscher',
237: 'miniature pinscher',
238: 'Greater Swiss Mountain dog',
239: 'Bernese mountain dog',
240: 'Appenzeller',
241: 'EntleBucher',
242: 'boxer',
243: 'bull mastiff',
244: 'Tibetan mastiff',
245: 'French bulldog',
246: 'Great Dane',
247: 'Saint Bernard, St Bernard',
248: 'Eskimo dog, husky',
249: 'malamute, malemute, Alaskan malamute',
250: 'Siberian husky',
251: 'dalmatian, coach dog, carriage dog',
252: 'affenpinscher, monkey pinscher, monkey dog',
253: 'basenji',
254: 'pug, pug-dog',
255: 'Leonberg',
256: 'Newfoundland, Newfoundland dog',
257: 'Great Pyrenees',
258: 'Samoyed, Samoyede',
259: 'Pomeranian',
260: 'chow, chow chow',
261: 'keeshond',
262: 'Brabancon griffon',
263: 'Pembroke, Pembroke Welsh corgi',
264: 'Cardigan, Cardigan Welsh corgi',
265: 'toy poodle',
266: 'miniature poodle',
267: 'standard poodle',
268: 'Mexican hairless',
269: 'timber wolf, grey wolf, gray wolf, Canis lupus',
270: 'white wolf, Arctic wolf, Canis lupus tundrarum',
271: 'red wolf, maned wolf, Canis rufus, Canis niger',
272: 'coyote, prairie wolf, brush wolf, Canis latrans',
273: 'dingo, warrigal, warragal, Canis dingo',
274: 'dhole, Cuon alpinus',
275: 'African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus',
276: 'hyena, hyaena',
277: 'red fox, Vulpes vulpes',
278: 'kit fox, Vulpes macrotis',
279: 'Arctic fox, white fox, Alopex lagopus',
280: 'grey fox, gray fox, Urocyon cinereoargenteus',
281: 'tabby, tabby cat',
282: 'tiger cat',
283: 'Persian cat',
284: 'Siamese cat, Siamese',
285: 'Egyptian cat',
286: 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor',
287: 'lynx, catamount',
288: 'leopard, Panthera pardus',
289: 'snow leopard, ounce, Panthera uncia',
290: 'jaguar, panther, Panthera onca, Felis onca',
291: 'lion, king of beasts, Panthera leo',
292: 'tiger, Panthera tigris',
293: 'cheetah, chetah, Acinonyx jubatus',
294: 'brown bear, bruin, Ursus arctos',
295: 'American black bear, black bear, Ursus americanus, Euarctos americanus',
296: 'ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus',
297: 'sloth bear, Melursus ursinus, Ursus ursinus',
298: 'mongoose',
299: 'meerkat, mierkat',
300: 'tiger beetle',
301: 'ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle',
302: 'ground beetle, carabid beetle',
303: 'long-horned beetle, longicorn, longicorn beetle',
304: 'leaf beetle, chrysomelid',
305: 'dung beetle',
306: 'rhinoceros beetle',
307: 'weevil',
308: 'fly',
309: 'bee',
310: 'ant, emmet, pismire',
311: 'grasshopper, hopper',
312: 'cricket',
313: 'walking stick, walkingstick, stick insect',
314: 'cockroach, roach',
315: 'mantis, mantid',
316: 'cicada, cicala',
317: 'leafhopper',
318: 'lacewing, lacewing fly',
319: "dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk",
320: 'damselfly',
321: 'admiral',
322: 'ringlet, ringlet butterfly',
323: 'monarch, monarch butterfly, milkweed butterfly, Danaus plexippus',
324: 'cabbage butterfly',
325: 'sulphur butterfly, sulfur butterfly',
326: 'lycaenid, lycaenid butterfly',
327: 'starfish, sea star',
328: 'sea urchin',
329: 'sea cucumber, holothurian',
330: 'wood rabbit, cottontail, cottontail rabbit',
331: 'hare',
332: 'Angora, Angora rabbit',
333: 'hamster',
334: 'porcupine, hedgehog',
335: 'fox squirrel, eastern fox squirrel, Sciurus niger',
336: 'marmot',
337: 'beaver',
338: 'guinea pig, Cavia cobaya',
339: 'sorrel',
340: 'zebra',
341: 'hog, pig, grunter, squealer, Sus scrofa',
342: 'wild boar, boar, Sus scrofa',
343: 'warthog',
344: 'hippopotamus, hippo, river horse, Hippopotamus amphibius',
345: 'ox',
346: 'water buffalo, water ox, Asiatic buffalo, Bubalus bubalis',
347: 'bison',
348: 'ram, tup',
349: 'bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis',
350: 'ibex, Capra ibex',
351: 'hartebeest',
352: 'impala, Aepyceros melampus',
353: 'gazelle',
354: 'Arabian camel, dromedary, Camelus dromedarius',
355: 'llama',
356: 'weasel',
357: 'mink',
358: 'polecat, fitch, foulmart, foumart, Mustela putorius',
359: 'black-footed ferret, ferret, Mustela nigripes',
360: 'otter',
361: 'skunk, polecat, wood pussy',
362: 'badger',
363: 'armadillo',
364: 'three-toed sloth, ai, Bradypus tridactylus',
365: 'orangutan, orang, orangutang, Pongo pygmaeus',
366: 'gorilla, Gorilla gorilla',
367: 'chimpanzee, chimp, Pan troglodytes',
368: 'gibbon, Hylobates lar',
369: 'siamang, Hylobates syndactylus, Symphalangus syndactylus',
370: 'guenon, guenon monkey',
371: 'patas, hussar monkey, Erythrocebus patas',
372: 'baboon',
373: 'macaque',
374: 'langur',
375: 'colobus, colobus monkey',
376: 'proboscis monkey, Nasalis larvatus',
377: 'marmoset',
378: 'capuchin, ringtail, Cebus capucinus',
379: 'howler monkey, howler',
380: 'titi, titi monkey',
381: 'spider monkey, Ateles geoffroyi',
382: 'squirrel monkey, Saimiri sciureus',
383: 'Madagascar cat, ring-tailed lemur, Lemur catta',
384: 'indri, indris, Indri indri, Indri brevicaudatus',
385: 'Indian elephant, Elephas maximus',
386: 'African elephant, Loxodonta africana',
387: 'lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens',
388: 'giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca',
389: 'barracouta, snoek',
390: 'eel',
391: 'coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch',
392: 'rock beauty, Holocanthus tricolor',
393: 'anemone fish',
394: 'sturgeon',
395: 'gar, garfish, garpike, billfish, Lepisosteus osseus',
396: 'lionfish',
397: 'puffer, pufferfish, blowfish, globefish',
398: 'abacus',
399: 'abaya',
400: "academic gown, academic robe, judge's robe",
401: 'accordion, piano accordion, squeeze box',
402: 'acoustic guitar',
403: 'aircraft carrier, carrier, flattop, attack aircraft carrier',
404: 'airliner',
405: 'airship, dirigible',
406: 'altar',
407: 'ambulance',
408: 'amphibian, amphibious vehicle',
409: 'analog clock',
410: 'apiary, bee house',
411: 'apron',
412: 'ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin',
413: 'assault rifle, assault gun',
414: 'backpack, back pack, knapsack, packsack, rucksack, haversack',
415: 'bakery, bakeshop, bakehouse',
416: 'balance beam, beam',
417: 'balloon',
418: 'ballpoint, ballpoint pen, ballpen, Biro',
419: 'Band Aid',
420: 'banjo',
421: 'bannister, banister, balustrade, balusters, handrail',
422: 'barbell',
423: 'barber chair',
424: 'barbershop',
425: 'barn',
426: 'barometer',
427: 'barrel, cask',
428: 'barrow, garden cart, lawn cart, wheelbarrow',
429: 'baseball',
430: 'basketball',
431: 'bassinet',
432: 'bassoon',
433: 'bathing cap, swimming cap',
434: 'bath towel',
435: 'bathtub, bathing tub, bath, tub',
436: 'beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon',
437: 'beacon, lighthouse, beacon light, pharos',
438: 'beaker',
439: 'bearskin, busby, shako',
440: 'beer bottle',
441: 'beer glass',
442: 'bell cote, bell cot',
443: 'bib',
444: 'bicycle-built-for-two, tandem bicycle, tandem',
445: 'bikini, two-piece',
446: 'binder, ring-binder',
447: 'binoculars, field glasses, opera glasses',
448: 'birdhouse',
449: 'boathouse',
450: 'bobsled, bobsleigh, bob',
451: 'bolo tie, bolo, bola tie, bola',
452: 'bonnet, poke bonnet',
453: 'bookcase',
454: 'bookshop, bookstore, bookstall',
455: 'bottlecap',
456: 'bow',
457: 'bow tie, bow-tie, bowtie',
458: 'brass, memorial tablet, plaque',
459: 'brassiere, bra, bandeau',
460: 'breakwater, groin, groyne, mole, bulwark, seawall, jetty',
461: 'breastplate, aegis, egis',
462: 'broom',
463: 'bucket, pail',
464: 'buckle',
465: 'bulletproof vest',
466: 'bullet train, bullet',
467: 'butcher shop, meat market',
468: 'cab, hack, taxi, taxicab',
469: 'caldron, cauldron',
470: 'candle, taper, wax light',
471: 'cannon',
472: 'canoe',
473: 'can opener, tin opener',
474: 'cardigan',
475: 'car mirror',
476: 'carousel, carrousel, merry-go-round, roundabout, whirligig',
477: "carpenter's kit, tool kit",
478: 'carton',
479: 'car wheel',
480: 'cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM',
481: 'cassette',
482: 'cassette player',
483: 'castle',
484: 'catamaran',
485: 'CD player',
486: 'cello, violoncello',
487: 'cellular telephone, cellular phone, cellphone, cell, mobile phone',
488: 'chain',
489: 'chainlink fence',
490: 'chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour',
491: 'chain saw, chainsaw',
492: 'chest',
493: 'chiffonier, commode',
494: 'chime, bell, gong',
495: 'china cabinet, china closet',
496: 'Christmas stocking',
497: 'church, church building',
498: 'cinema, movie theater, movie theatre, movie house, picture palace',
499: 'cleaver, meat cleaver, chopper',
500: 'cliff dwelling',
501: 'cloak',
502: 'clog, geta, patten, sabot',
503: 'cocktail shaker',
504: 'coffee mug',
505: 'coffeepot',
506: 'coil, spiral, volute, whorl, helix',
507: 'combination lock',
508: 'computer keyboard, keypad',
509: 'confectionery, confectionary, candy store',
510: 'container ship, containership, container vessel',
511: 'convertible',
512: 'corkscrew, bottle screw',
513: 'cornet, horn, trumpet, trump',
514: 'cowboy boot',
515: 'cowboy hat, ten-gallon hat',
516: 'cradle',
517: 'crane',
518: 'crash helmet',
519: 'crate',
520: 'crib, cot',
521: 'Crock Pot',
522: 'croquet ball',
523: 'crutch',
524: 'cuirass',
525: 'dam, dike, dyke',
526: 'desk',
527: 'desktop computer',
528: 'dial telephone, dial phone',
529: 'diaper, nappy, napkin',
530: 'digital clock',
531: 'digital watch',
532: 'dining table, board',
533: 'dishrag, dishcloth',
534: 'dishwasher, dish washer, dishwashing machine',
535: 'disk brake, disc brake',
536: 'dock, dockage, docking facility',
537: 'dogsled, dog sled, dog sleigh',
538: 'dome',
539: 'doormat, welcome mat',
540: 'drilling platform, offshore rig',
541: 'drum, membranophone, tympan',
542: 'drumstick',
543: 'dumbbell',
544: 'Dutch oven',
545: 'electric fan, blower',
546: 'electric guitar',
547: 'electric locomotive',
548: 'entertainment center',
549: 'envelope',
550: 'espresso maker',
551: 'face powder',
552: 'feather boa, boa',
553: 'file, file cabinet, filing cabinet',
554: 'fireboat',
555: 'fire engine, fire truck',
556: 'fire screen, fireguard',
557: 'flagpole, flagstaff',
558: 'flute, transverse flute',
559: 'folding chair',
560: 'football helmet',
561: 'forklift',
562: 'fountain',
563: 'fountain pen',
564: 'four-poster',
565: 'freight car',
566: 'French horn, horn',
567: 'frying pan, frypan, skillet',
568: 'fur coat',
569: 'garbage truck, dustcart',
570: 'gasmask, respirator, gas helmet',
571: 'gas pump, gasoline pump, petrol pump, island dispenser',
572: 'goblet',
573: 'go-kart',
574: 'golf ball',
575: 'golfcart, golf cart',
576: 'gondola',
577: 'gong, tam-tam',
578: 'gown',
579: 'grand piano, grand',
580: 'greenhouse, nursery, glasshouse',
581: 'grille, radiator grille',
582: 'grocery store, grocery, food market, market',
583: 'guillotine',
584: 'hair slide',
585: 'hair spray',
586: 'half track',
587: 'hammer',
588: 'hamper',
589: 'hand blower, blow dryer, blow drier, hair dryer, hair drier',
590: 'hand-held computer, hand-held microcomputer',
591: 'handkerchief, hankie, hanky, hankey',
592: 'hard disc, hard disk, fixed disk',
593: 'harmonica, mouth organ, harp, mouth harp',
594: 'harp',
595: 'harvester, reaper',
596: 'hatchet',
597: 'holster',
598: 'home theater, home theatre',
599: 'honeycomb',
600: 'hook, claw',
601: 'hoopskirt, crinoline',
602: 'horizontal bar, high bar',
603: 'horse cart, horse-cart',
604: 'hourglass',
605: 'iPod',
606: 'iron, smoothing iron',
607: "jack-o'-lantern",
608: 'jean, blue jean, denim',
609: 'jeep, landrover',
610: 'jersey, T-shirt, tee shirt',
611: 'jigsaw puzzle',
612: 'jinrikisha, ricksha, rickshaw',
613: 'joystick',
614: 'kimono',
615: 'knee pad',
616: 'knot',
617: 'lab coat, laboratory coat',
618: 'ladle',
619: 'lampshade, lamp shade',
620: 'laptop, laptop computer',
621: 'lawn mower, mower',
622: 'lens cap, lens cover',
623: 'letter opener, paper knife, paperknife',
624: 'library',
625: 'lifeboat',
626: 'lighter, light, igniter, ignitor',
627: 'limousine, limo',
628: 'liner, ocean liner',
629: 'lipstick, lip rouge',
630: 'Loafer',
631: 'lotion',
632: 'loudspeaker, speaker, speaker unit, loudspeaker system, speaker system',
633: "loupe, jeweler's loupe",
634: 'lumbermill, sawmill',
635: 'magnetic compass',
636: 'mailbag, postbag',
637: 'mailbox, letter box',
638: 'maillot',
639: 'maillot, tank suit',
640: 'manhole cover',
641: 'maraca',
642: 'marimba, xylophone',
643: 'mask',
644: 'matchstick',
645: 'maypole',
646: 'maze, labyrinth',
647: 'measuring cup',
648: 'medicine chest, medicine cabinet',
649: 'megalith, megalithic structure',
650: 'microphone, mike',
651: 'microwave, microwave oven',
652: 'military uniform',
653: 'milk can',
654: 'minibus',
655: 'miniskirt, mini',
656: 'minivan',
657: 'missile',
658: 'mitten',
659: 'mixing bowl',
660: 'mobile home, manufactured home',
661: 'Model T',
662: 'modem',
663: 'monastery',
664: 'monitor',
665: 'moped',
666: 'mortar',
667: 'mortarboard',
668: 'mosque',
669: 'mosquito net',
670: 'motor scooter, scooter',
671: 'mountain bike, all-terrain bike, off-roader',
672: 'mountain tent',
673: 'mouse, computer mouse',
674: 'mousetrap',
675: 'moving van',
676: 'muzzle',
677: 'nail',
678: 'neck brace',
679: 'necklace',
680: 'nipple',
681: 'notebook, notebook computer',
682: 'obelisk',
683: 'oboe, hautboy, hautbois',
684: 'ocarina, sweet potato',
685: 'odometer, hodometer, mileometer, milometer',
686: 'oil filter',
687: 'organ, pipe organ',
688: 'oscilloscope, scope, cathode-ray oscilloscope, CRO',
689: 'overskirt',
690: 'oxcart',
691: 'oxygen mask',
692: 'packet',
693: 'paddle, boat paddle',
694: 'paddlewheel, paddle wheel',
695: 'padlock',
696: 'paintbrush',
697: "pajama, pyjama, pj's, jammies",
698: 'palace',
699: 'panpipe, pandean pipe, syrinx',
700: 'paper towel',
701: 'parachute, chute',
702: 'parallel bars, bars',
703: 'park bench',
704: 'parking meter',
705: 'passenger car, coach, carriage',
706: 'patio, terrace',
707: 'pay-phone, pay-station',
708: 'pedestal, plinth, footstall',
709: 'pencil box, pencil case',
710: 'pencil sharpener',
711: 'perfume, essence',
712: 'Petri dish',
713: 'photocopier',
714: 'pick, plectrum, plectron',
715: 'pickelhaube',
716: 'picket fence, paling',
717: 'pickup, pickup truck',
718: 'pier',
719: 'piggy bank, penny bank',
720: 'pill bottle',
721: 'pillow',
722: 'ping-pong ball',
723: 'pinwheel',
724: 'pirate, pirate ship',
725: 'pitcher, ewer',
726: "plane, carpenter's plane, woodworking plane",
727: 'planetarium',
728: 'plastic bag',
729: 'plate rack',
730: 'plow, plough',
731: "plunger, plumber's helper",
732: 'Polaroid camera, Polaroid Land camera',
733: 'pole',
734: 'police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria',
735: 'poncho',
736: 'pool table, billiard table, snooker table',
737: 'pop bottle, soda bottle',
738: 'pot, flowerpot',
739: "potter's wheel",
740: 'power drill',
741: 'prayer rug, prayer mat',
742: 'printer',
743: 'prison, prison house',
744: 'projectile, missile',
745: 'projector',
746: 'puck, hockey puck',
747: 'punching bag, punch bag, punching ball, punchball',
748: 'purse',
749: 'quill, quill pen',
750: 'quilt, comforter, comfort, puff',
751: 'racer, race car, racing car',
752: 'racket, racquet',
753: 'radiator',
754: 'radio, wireless',
755: 'radio telescope, radio reflector',
756: 'rain barrel',
757: 'recreational vehicle, RV, R.V.',
758: 'reel',
759: 'reflex camera',
760: 'refrigerator, icebox',
761: 'remote control, remote',
762: 'restaurant, eating house, eating place, eatery',
763: 'revolver, six-gun, six-shooter',
764: 'rifle',
765: 'rocking chair, rocker',
766: 'rotisserie',
767: 'rubber eraser, rubber, pencil eraser',
768: 'rugby ball',
769: 'rule, ruler',
770: 'running shoe',
771: 'safe',
772: 'safety pin',
773: 'saltshaker, salt shaker',
774: 'sandal',
775: 'sarong',
776: 'sax, saxophone',
777: 'scabbard',
778: 'scale, weighing machine',
779: 'school bus',
780: 'schooner',
781: 'scoreboard',
782: 'screen, CRT screen',
783: 'screw',
784: 'screwdriver',
785: 'seat belt, seatbelt',
786: 'sewing machine',
787: 'shield, buckler',
788: 'shoe shop, shoe-shop, shoe store',
789: 'shoji',
790: 'shopping basket',
791: 'shopping cart',
792: 'shovel',
793: 'shower cap',
794: 'shower curtain',
795: 'ski',
796: 'ski mask',
797: 'sleeping bag',
798: 'slide rule, slipstick',
799: 'sliding door',
800: 'slot, one-armed bandit',
801: 'snorkel',
802: 'snowmobile',
803: 'snowplow, snowplough',
804: 'soap dispenser',
805: 'soccer ball',
806: 'sock',
807: 'solar dish, solar collector, solar furnace',
808: 'sombrero',
809: 'soup bowl',
810: 'space bar',
811: 'space heater',
812: 'space shuttle',
813: 'spatula',
814: 'speedboat',
815: "spider web, spider's web",
816: 'spindle',
817: 'sports car, sport car',
818: 'spotlight, spot',
819: 'stage',
820: 'steam locomotive',
821: 'steel arch bridge',
822: 'steel drum',
823: 'stethoscope',
824: 'stole',
825: 'stone wall',
826: 'stopwatch, stop watch',
827: 'stove',
828: 'strainer',
829: 'streetcar, tram, tramcar, trolley, trolley car',
830: 'stretcher',
831: 'studio couch, day bed',
832: 'stupa, tope',
833: 'submarine, pigboat, sub, U-boat',
834: 'suit, suit of clothes',
835: 'sundial',
836: 'sunglass',
837: 'sunglasses, dark glasses, shades',
838: 'sunscreen, sunblock, sun blocker',
839: 'suspension bridge',
840: 'swab, swob, mop',
841: 'sweatshirt',
842: 'swimming trunks, bathing trunks',
843: 'swing',
844: 'switch, electric switch, electrical switch',
845: 'syringe',
846: 'table lamp',
847: 'tank, army tank, armored combat vehicle, armoured combat vehicle',
848: 'tape player',
849: 'teapot',
850: 'teddy, teddy bear',
851: 'television, television system',
852: 'tennis ball',
853: 'thatch, thatched roof',
854: 'theater curtain, theatre curtain',
855: 'thimble',
856: 'thresher, thrasher, threshing machine',
857: 'throne',
858: 'tile roof',
859: 'toaster',
860: 'tobacco shop, tobacconist shop, tobacconist',
861: 'toilet seat',
862: 'torch',
863: 'totem pole',
864: 'tow truck, tow car, wrecker',
865: 'toyshop',
866: 'tractor',
867: 'trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi',
868: 'tray',
869: 'trench coat',
870: 'tricycle, trike, velocipede',
871: 'trimaran',
872: 'tripod',
873: 'triumphal arch',
874: 'trolleybus, trolley coach, trackless trolley',
875: 'trombone',
876: 'tub, vat',
877: 'turnstile',
878: 'typewriter keyboard',
879: 'umbrella',
880: 'unicycle, monocycle',
881: 'upright, upright piano',
882: 'vacuum, vacuum cleaner',
883: 'vase',
884: 'vault',
885: 'velvet',
886: 'vending machine',
887: 'vestment',
888: 'viaduct',
889: 'violin, fiddle',
890: 'volleyball',
891: 'waffle iron',
892: 'wall clock',
893: 'wallet, billfold, notecase, pocketbook',
894: 'wardrobe, closet, press',
895: 'warplane, military plane',
896: 'washbasin, handbasin, washbowl, lavabo, wash-hand basin',
897: 'washer, automatic washer, washing machine',
898: 'water bottle',
899: 'water jug',
900: 'water tower',
901: 'whiskey jug',
902: 'whistle',
903: 'wig',
904: 'window screen',
905: 'window shade',
906: 'Windsor tie',
907: 'wine bottle',
908: 'wing',
909: 'wok',
910: 'wooden spoon',
911: 'wool, woolen, woollen',
912: 'worm fence, snake fence, snake-rail fence, Virginia fence',
913: 'wreck',
914: 'yawl',
915: 'yurt',
916: 'web site, website, internet site, site',
917: 'comic book',
918: 'crossword puzzle, crossword',
919: 'street sign',
920: 'traffic light, traffic signal, stoplight',
921: 'book jacket, dust cover, dust jacket, dust wrapper',
922: 'menu',
923: 'plate',
924: 'guacamole',
925: 'consomme',
926: 'hot pot, hotpot',
927: 'trifle',
928: 'ice cream, icecream',
929: 'ice lolly, lolly, lollipop, popsicle',
930: 'French loaf',
931: 'bagel, beigel',
932: 'pretzel',
933: 'cheeseburger',
934: 'hotdog, hot dog, red hot',
935: 'mashed potato',
936: 'head cabbage',
937: 'broccoli',
938: 'cauliflower',
939: 'zucchini, courgette',
940: 'spaghetti squash',
941: 'acorn squash',
942: 'butternut squash',
943: 'cucumber, cuke',
944: 'artichoke, globe artichoke',
945: 'bell pepper',
946: 'cardoon',
947: 'mushroom',
948: 'Granny Smith',
949: 'strawberry',
950: 'orange',
951: 'lemon',
952: 'fig',
953: 'pineapple, ananas',
954: 'banana',
955: 'jackfruit, jak, jack',
956: 'custard apple',
957: 'pomegranate',
958: 'hay',
959: 'carbonara',
960: 'chocolate sauce, chocolate syrup',
961: 'dough',
962: 'meat loaf, meatloaf',
963: 'pizza, pizza pie',
964: 'potpie',
965: 'burrito',
966: 'red wine',
967: 'espresso',
968: 'cup',
969: 'eggnog',
970: 'alp',
971: 'bubble',
972: 'cliff, drop, drop-off',
973: 'coral reef',
974: 'geyser',
975: 'lakeside, lakeshore',
976: 'promontory, headland, head, foreland',
977: 'sandbar, sand bar',
978: 'seashore, coast, seacoast, sea-coast',
979: 'valley, vale',
980: 'volcano',
981: 'ballplayer, baseball player',
982: 'groom, bridegroom',
983: 'scuba diver',
984: 'rapeseed',
985: 'daisy',
986: "yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum",
987: 'corn',
988: 'acorn',
989: 'hip, rose hip, rosehip',
990: 'buckeye, horse chestnut, conker',
991: 'coral fungus',
992: 'agaric',
993: 'gyromitra',
994: 'stinkhorn, carrion fungus',
995: 'earthstar',
996: 'hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa',
997: 'bolete',
998: 'ear, spike, capitulum',
999: 'toilet tissue, toilet paper, bathroom tissue'}
A pandas DataFrame is the friendliest way to line up the index, the probability, and the label side by side. Start with an empty dict of columns, fill it in the loop, then hand it to pd.DataFrame. The order of keys determines the order of columns in the rendered table.
pred_dict = {
"Index": [],
"Probability": [],
"Label": []
}Building the table is a tiny loop: for each of the top-5 indices, pull the probability and the human-readable label out of the parallel arrays. Keeping the source as a loop (instead of writing five append lines by hand) means the same code works if we ever want top-3 or top-10 — only the 5 changes.
for i in range(5):
pred_dict["Index"].append(top_5_labels[0][i])
pred_dict["Probability"].append(top_5_probs[0][i])
pred_dict["Label"].append(id2label[top_5_labels[0][i]])The DataFrame shows the model's top-5 guesses side by side: index, probability, and human-readable label. If the leading row is at ~0.7+ and the rest drop off, the model is confident; if the top two or three are close, the model is genuinely uncertain between visually similar classes.
pred_df = pd.DataFrame(pred_dict)
pred_df| Index | Probability | Label | |
|---|---|---|---|
| 0 | 504 | 0.767737 | coffee mug |
| 1 | 968 | 0.112051 | cup |
| 2 | 967 | 0.031235 | espresso |
| 3 | 487 | 0.007981 | cellular telephone, cellular phone, cellphone,... |
| 4 | 918 | 0.007742 | crossword puzzle, crossword |
A quick pwd to anchor the working directory. The image path that follows is relative to wherever the notebook is opened, so the next cell needs this cell to make sense of "where am I running from?"
What changed once the patches became tokens
Two things are worth carrying away from the walkthrough. The first is that the patch embedder inside google/vit-base-patch16-224 is just a Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16)) — exactly the cut we simulated by hand, followed by a linear projection to a 768-dimensional embedding. The second is that every layer in the model is a Transformer encoder, so once the image is a [196, 768] sequence, the rest of the architecture is identical to the one used for text. The "vision" part of a Vision Transformer is almost entirely in that first conv.
If the next question on your mind is "what does the model see inside each patch after the first attention layer?" — the patch grid is exactly the right place to start. The 196 token embeddings are a structured view of the image in which position is explicit, and that is the property the rest of the architecture is built to exploit.