Back to blog
← View series: ibm ai engineering

~/blog

Datasets and Transforms

Apr 1, 2026•5 min read•By Mohammed Vasim

AIMachine LearningLLMPyTorchTensorFlowGenerative AILangChainAI Agents

Image Datasets and Transforms

Objective

How to build a image dataset object.
How to perform pre-build transforms from Torchvision Transforms to the dataset. .

In this lab, you will build a dataset objects for images; many of the processes can be applied to a larger dataset. Then you will apply pre-build transforms from Torchvision Transforms to that dataset.

Auxiliary Functions
Datasets
Torchvision Transforms

Estimated Time Needed: 25 min

Preparation

Download the dataset and unzip the files in your data directory, to download faster this dataset has only 100 samples:

python

! wget https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DL0110EN-SkillsNetwork/labs/Week1/data/img.tar.gz -P /resources/data

python

!tar -xf /resources/data/img.tar.gz

python

!wget https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DL0110EN-SkillsNetwork/labs/Week1/data/index.csv

We will use this function in the lab:

python

def show_data(data_sample, shape = (28, 28)):
    plt.imshow(data_sample[0].numpy().reshape(shape), cmap='gray')
    plt.title('y = ' + data_sample[1])

The following are the libraries we are going to use for this lab. The torch.manual_seed() is for forcing the random function to give the same number every time we try to recompile it.

python

# These are the libraries will be used for this lab.

import torch 
import matplotlib.pylab as plt
import numpy as np
from torch.utils.data import Dataset, DataLoader
torch.manual_seed(0)

python

from matplotlib.pyplot import imshow
import matplotlib.pylab as plt
from PIL import Image
import pandas as pd
import os

Auxiliary Functions

You will use the following function as components of a dataset object, in this section, you will review each of the components independently.

The path to the csv file with the labels for each image.

python

# Read CSV file from the URL and print out the first five samples
directory=""
csv_file ='index.csv'
csv_path=os.path.join(directory,csv_file)

You can load the CSV file and convert it into a dataframe , using the Pandas function read_csv() . You can view the dataframe using the method head.

python

data_name = pd.read_csv(csv_path)
data_name.head()

The first column of the dataframe corresponds to the type of clothing. The second column is the name of the image file corresponding to the clothing. You can obtain the path of the first file by using the method ,[object Object],.iloc[0, 1]. The first argument corresponds to the sample number, and the second input corresponds to the column index.

python

# Get the value on location row 0, column 1 (Notice that index starts at 0)
#rember this dataset has only 100 samples to make the download faster  
print('File name:', data_name.iloc[0, 1])

As the class of the sample is in the first column, you can also obtain the class value as follows.

python

# Get the value on location row 0, column 0 (Notice that index starts at 0.)

print('y:', data_name.iloc[0, 0])

Similarly, You can obtain the file name of the second image file and class type:

python

# Print out the file name and the class number of the element on row 1 (the second row)

print('File name:', data_name.iloc[1, 1])
print('class or y:', data_name.iloc[1, 0])

The number of samples corresponds to the number of rows in a dataframe. You can obtain the number of rows using the following lines of code. This will correspond the data attribute len.

python

# Print out the total number of rows in traing dataset

print('The number of rows: ', data_name.shape[0])

Load Image

To load the image, you need the directory and the image name. You can concatenate the variable train_data_dir with the name of the image stored in a Dataframe. Finally, you will store the result in the variable image_name

python

# Combine the directory path with file name

image_name =data_name.iloc[1, 1]
image_name

we can find the image path:

python

image_path=os.path.join(directory,image_name)
image_path

You can then use the function Image.open to store the image to the variable image and display the image and class .

python

# Plot the second training image

image = Image.open(image_path)
plt.imshow(image,cmap='gray', vmin=0, vmax=255)
plt.title(data_name.iloc[1, 0])
plt.show()

You can repeat the process for the 20th image.

python

# Plot the 20th image

image_name = data_name.iloc[19, 1]
image_path=os.path.join(directory,image_name)
image = Image.open(image_path)
plt.imshow(image,cmap='gray', vmin=0, vmax=255)
plt.title(data_name.iloc[19, 0])
plt.show()

Create the dataset object.

Create a Dataset Class

In this section, we will use the components in the last section to build a dataset class and then create an object.

python

# Create your own dataset object

class Dataset(Dataset):

    # Constructor
    def __init__(self, csv_file, data_dir, transform=None):
        
        # Image directory
        self.data_dir=data_dir
        
        # The transform is goint to be used on image
        self.transform = transform
        data_dircsv_file=os.path.join(self.data_dir,csv_file)
        # Load the CSV file contians image info
        self.data_name= pd.read_csv(data_dircsv_file)
        
        # Number of images in dataset
        self.len=self.data_name.shape[0] 
    
    # Get the length
    def __len__(self):
        return self.len
    
    # Getter
    def __getitem__(self, idx):
        
        # Image file path
        img_name=os.path.join(self.data_dir,self.data_name.iloc[idx, 1])
        # Open image file
        image = Image.open(img_name)
        
        # The class label for the image
        y = self.data_name.iloc[idx, 0]
        
        # If there is any transform method, apply it onto the image
        if self.transform:
            image = self.transform(image)

        return image, y

python

# Create the dataset objects

dataset = Dataset(csv_file=csv_file, data_dir=directory)

Each sample of the image and the class y is stored in a tuple dataset[sample] . The image is the first element in the tuple dataset[sample][0] the label or class is the second element in the tuple dataset[sample][1]. For example you can plot the first image and class.

python

image=dataset[0][0]
y=dataset[0][1]

plt.imshow(image,cmap='gray', vmin=0, vmax=255)
plt.title(y)
plt.show()

python

Similarly, you can plot the second image:

python

image=dataset[9][0]
y=dataset[9][1]

plt.imshow(image,cmap='gray', vmin=0, vmax=255)
plt.title(y)
plt.show()

Torchvision Transforms

You will focus on the following libraries:

python

import torchvision.transforms as transforms

We can apply some image transform functions on the dataset object. The iamge can be cropped and converted to a tensor. We can use transform.Compose we learned from the previous lab to combine the two transform functions.

python

# Combine two transforms: crop and convert to tensor. Apply the compose to MNIST dataset

croptensor_data_transform = transforms.Compose([transforms.CenterCrop(20), transforms.ToTensor()])
dataset = Dataset(csv_file=csv_file , data_dir=directory,transform=croptensor_data_transform )
print("The shape of the first element tensor: ", dataset[0][0].shape)

We can see the image is now 20 x 20

Let us plot the first image again. Notice we see less of the shoe.

python

# Plot the first element in the dataset

show_data(dataset[0],shape = (20, 20))

python

# Plot the second element in the dataset

show_data(dataset[1],shape = (20, 20))

In the below example, we Vertically flip the image, and then convert it to a tensor. Use transforms.Compose() to combine these two transform functions. Plot the flipped image.

python

# Construct the compose. Apply it on MNIST dataset. Plot the image out.

fliptensor_data_transform = transforms.Compose([transforms.RandomVerticalFlip(p=1),transforms.ToTensor()])
dataset = Dataset(csv_file=csv_file , data_dir=directory,transform=fliptensor_data_transform )
show_data(dataset[1])

Practice

Try to use the RandomVerticalFlip (vertically flip the image) with horizontally flip and convert to tensor as a compose. Apply the compose on image. Use show_data() to plot the second image (the image as 2).

python

# Practice: Combine vertical flip, horizontal flip and convert to tensor as a compose. Apply the compose on image. Then plot the image

# Type your code here

Double-click here for the solution.

About the Authors:

Joseph Santarcangelo has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Other contributors: Michelle Carey, Mavis Zhou

Datasets and Transforms

Image Datasets and Transforms

Objective

Table of Contents

Preparation

Auxiliary Functions

Load Image

Create a Dataset Class

Torchvision Transforms

Practice

About the Authors:

© IBM Corporation. All rights reserved.

Comments (0)

Leave a comment