Build an Image Dataset in TensorFlow

Last updated: May 14, 2026 3:02 pm

2 Min Read

This guide is a hands-on tutorial to build an image dataset for deep learning in TensorFlow. You’ll be familiar with all possible ways to accomplish this task in TensorFlow

Contents

Using ImageDataGenerator

Source Code

Using tf.data API

Using ImageDataGenerator

This is the most easiest way to prepare an image dataset. You just need to specify the dataset folder and it’ll retrieve all subfolders and images within each subfolder. Each subfolder will represent a class.

Source Code

import tensorflow as tf
images_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
train_images, train_labels = next(images_generator.flow_from_directory("DIRECTORY_NAME_HERE"))

The output will be “Found 15406 images belonging to 12 classes.” because there are 12 sub folders in the main folder.

Using tf.data API

The tf.data api is useful to create input pipleline and avoid memory overflow in large datasets.

DATASET_PATH = 'images'
image_classes = glob.glob(DATASET_PATH+'/*')
all_images = []
all_labels = []
for image_class in image_classes:
    class_name = os.path.split(image_class)[-1]
    class_images = glob.glob(image_class + '/*.jpg')
    all_images.extend(class_images)
    all_labels.extend([class_name for i in range(len(class_images))])

image_paths = tf.convert_to_tensor(all_images, dtype=tf.string)
labels = tf.convert_to_tensor(all_labels)

# Build a TF Queue, shuffle data
dataset = tf.data.Dataset.from_tensor_slices((image_paths, labels))
def im_file_to_tensor(file, label):
    def _im_file_to_tensor(file, label):
        path = file
        im = tf.image.decode_jpeg(tf.io.read_file(path), channels=3)
        im = tf.cast(im, tf.float32) / 255.0
        return im, label
    return tf.py_function(_im_file_to_tensor, 
                          inp=(file, label), 
                          Tout=(tf.float32, tf.string))

dataset = dataset.map(im_file_to_tensor)

def show(image, label):
  plt.figure()
  plt.imshow(image)
  plt.title(label.numpy().decode('utf-8'))
  plt.axis('off')

image,label = next(iter(dataset))
show(image,label)

Sandeep Kumar

Sandeep Kumar is the Founder & CEO of Aitude, a leading AI tools, research, and tutorial platform dedicated to empowering learners, researchers, and innovators. Under his leadership, Aitude has become a go-to resource for those seeking the latest in artificial intelligence, machine learning, computer vision, and development strategies.