Image Generation using Generative Adversarial Networks (GANs)

Last Updated : 21 Jun, 2024

Generative Adversarial Networks (GANs) represent a revolutionary approach to, artificial intelligence, particularly for generating images. Introduced in 2014, GANs have significantly advanced the ability to create realistic and high-quality images from random noise.

In this article, we are going to train GANs model on MNIST dataset for generating images.

Table of Content

Training GANs for Image Generation

Training the Discriminator
Training the Generator

Implementing Generative Adversarial Networks (GANs) for Image Generation

Step 1: Import Necessary Libraries and Load Dataset
Step 2: Building the Models

Generator Model with CNN Layers
Discriminator Model with CNN Layers

Step 3: Compiling the Models
Step 4: Model Training and Visualizing
Complete Code to Generate Images using GANs

Challenges and Considerations
Conclusion

Training GANs for Image Generation

Generative Adversarial Networks (GANs) employ two neural networks, the Generator, and the Discriminator, in a competitive framework where the Generator synthesizes images from random noise, striving to produce outputs indistinguishable from real data.

Training Generative Adversarial Networks (GANs) is an iterative process that revolves around the interaction between two neural networks:

Training the Discriminator

The Discriminator starts by being trained on a dataset containing real images. Its goal is to differentiate between these real images and fake images generated by the Generator. Through backpropagation and gradient descent, the Discriminator adjusts its parameters to improve its ability to accurately classify real and generated images.

Training the Generator

Concurrently, the Generator is trained to produce images that are increasingly difficult for the Discriminator to distinguish from real images. Initially, the Generator generates random noise, but as training progresses, it learns to generate images that resemble those in the training dataset. The Generator’s parameters are adjusted based on the feedback from the Discriminator, optimizing the Generator’s ability to create more realistic and high-quality images.

Implementing Generative Adversarial Networks (GANs) for Image Generation

Step 1: Import Necessary Libraries and Load Dataset

Import necessary libraries including TensorFlow, Keras layers and models, NumPy for numerical operations, and Matplotlib for plotting.

import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers
import matplotlib.pyplot as plt

Proper data preparation is crucial for the successful training of neural networks. For the MNIST dataset, the preprocessing steps include loading the dataset, reshaping the images to ensure they are in the correct format for TensorFlow processing, and normalizing the pixel values to the range [0,1]. Normalization helps stabilize the training process by keeping the input values small.

# Step 1: Dataset Preparation
# Assuming you have a dataset of images (e.g., MNIST), load and preprocess them
(x_train, _), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape((-1, 28, 28, 1)).astype('float32') / 255.0

Step 2: Building the Models

This step involves defining the architecture for both the generator and the discriminator using convolutional neural network (CNN) layers, tailored to efficiently process and generate image data.

Generator Model with CNN Layers

The generator’s role in a GAN is to synthesize new images that mimic the distribution of a given dataset. In this case, we use convolutional transpose layers, which are effective for upscaling the input and creating detailed images from a lower-dimensional noise vector.

Dense Layer: Converts the input 100-dimensional noise vector into a high-dimensional feature map.
Reshape: Transforms the feature map into a 3D shape that can be processed by convolutional layers.
Conv2DTranspose Layers: These layers perform upscaling and convolution simultaneously, gradually increasing the resolution of the generated image.
BatchNormalization: Stabilizes the learning process and helps in faster convergence.
Activation Functions: ‘ReLU’ is used for non-linearity in intermediate layers, while ‘sigmoid’ is used in the output layer to normalize the pixel values between 0 and 1.

def build_generator_cnn():
    model = models.Sequential([
        # Start with a fully connected layer to convert the input noise vector into a suitable shape
        layers.Dense(7*7*128, input_dim=100, activation='relu'),
        layers.Reshape((7, 7, 128)),  # Reshape into an initial image format

        # First upsampling and convolution to increase image size to 14x14
        layers.Conv2DTranspose(128, kernel_size=4, strides=2, padding='same', activation='relu'),
        layers.BatchNormalization(),

        # Second upsampling and convolution to increase image size to 28x28
        layers.Conv2DTranspose(128, kernel_size=4, strides=2, padding='same', activation='relu'),
        layers.BatchNormalization(),

        # Final convolution to produce a 28x28 image with 1 output channel (grayscale)
        layers.Conv2D(1, kernel_size=7, activation='sigmoid', padding='same')
    ])
    return model

Discriminator Model with CNN Layers

The discriminator is a binary classifier that determines whether a given image is real (from the dataset) or fake (generated by the generator).

Conv2D Layers: Perform convolutions with a stride of 2 to downsample the image, reducing its dimensionality and increasing the field of view of the filters.
BatchNormalization: Used here as well to ensure stable training.
Flatten: Converts the 2D feature maps into a 1D feature vector necessary for classification.
Dense Output Layer: Outputs a single probability indicating the likelihood that the input image is real.

def build_discriminator_cnn():
    model = models.Sequential([
        # Input layer, starting convolution
        layers.Conv2D(64, kernel_size=3, strides=2, input_shape=(28, 28, 1), padding='same', activation='relu'),
        
        # Second convolution to further downsample
        layers.Conv2D(128, kernel_size=3, strides=2, padding='same', activation='relu'),
        layers.BatchNormalization(),

        # Flatten the convolution output to connect to a dense output layer
        layers.Flatten(),
        layers.Dense(1, activation='sigmoid')
    ])
    return model

Step 3: Compiling the Models

First, compile and set up the combined GAN model, which connects the generator and discriminator. This setup is crucial for training the generator while keeping the discriminator’s parameters fixed during the generator’s training updates.

# Instantiate the generator and discriminator models
generator_cnn = build_generator_cnn()
discriminator_cnn = build_discriminator_cnn()

# Compile the Discriminator
discriminator_cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0002), 
                          loss='binary_crossentropy', 
                          metrics=['accuracy'])

# Set the discriminator to non-trainable when we are training the generator
discriminator_cnn.trainable = False

# Combined model
gan_input = layers.Input(shape=(100,))
gan_output = discriminator_cnn(generator_cnn(gan_input))
gan_cnn = models.Model(gan_input, gan_output)
gan_cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0002), 
                loss='binary_crossentropy')

Step 4: Model Training and Visualizing

The training loop involves alternately training the discriminator and the generator. The discriminator learns to distinguish real images from the fake ones produced by the generator. Simultaneously, the generator learns to fool the discriminator by generating increasingly realistic images.

epochs = 10000
batch_size = 64

for epoch in range(epochs):
    # Random noise for generator
    noise = np.random.normal(0, 1, (batch_size, 100))
    generated_images = generator_cnn.predict(noise)

    # Get a random batch of real images
    idx = np.random.randint(0, x_train.shape[0], batch_size)
    real_images = x_train[idx]

    # Labels for real and generated data
    real_labels = np.ones((batch_size, 1))
    fake_labels = np.zeros((batch_size, 1))

    # Train the Discriminator
    d_loss_real = discriminator_cnn.train_on_batch(real_images, real_labels)
    d_loss_fake = discriminator_cnn.train_on_batch(generated_images, fake_labels)
    d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

    # Train the Generator
    noise = np.random.normal(0, 1, (batch_size, 100))
    valid_labels = np.ones((batch_size, 1))
    g_loss = gan_cnn.train_on_batch(noise, valid_labels)

    # Output training progress
    if epoch % 100 == 0:
        print(f"Epoch {epoch}: D Loss: {d_loss[0]}, G Loss: {g_loss}")

    # Save and display generated images at intervals
    if epoch % 1000 == 0:
        test_noise = np.random.normal(0, 1, (1, 100))
        test_img = generator_cnn.predict(test_noise)[0].reshape(28, 28)
        plt.imshow(test_img, cmap='gray')
        plt.axis('off')
        plt.show()

Complete Code to Generate Images using GANs

Python

import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers
import matplotlib.pyplot as plt

# Step 1: Dataset Preparation
# Assuming you have a dataset of images (e.g., MNIST), load and preprocess them
(x_train, _), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape((-1, 28, 28, 1)).astype('float32') / 255.0

def build_generator_cnn():
    model = models.Sequential([
        # Start with a fully connected layer to interpret the seed
        layers.Dense(7*7*128, input_dim=100, activation='relu'),
        layers.Reshape((7, 7, 128)),  # Reshape into an image format

        # Upsample to 14x14
        layers.Conv2DTranspose(128, kernel_size=4, strides=2, padding='same', activation='relu'),
        layers.BatchNormalization(),

        # Upsample to 28x28
        layers.Conv2DTranspose(128, kernel_size=4, strides=2, padding='same', activation='relu'),
        layers.BatchNormalization(),

        # Output layer with the shape of the target image, 1 channel for grayscale
        layers.Conv2D(1, kernel_size=7, activation='sigmoid', padding='same')
    ])
    return model

def build_discriminator_cnn():
    model = models.Sequential([
        # Input layer with the shape of the target image
        layers.Conv2D(64, kernel_size=3, strides=2, input_shape=(28, 28, 1), padding='same', activation='relu'),
        
        # Downsample to 14x14
        layers.Conv2D(128, kernel_size=3, strides=2, padding='same', activation='relu'),
        layers.BatchNormalization(),

        # Further downsampling and flattening to feed into a dense output layer
        layers.Flatten(),
        layers.Dense(1, activation='sigmoid')
    ])
    return model

# Instantiate the CNN-based Generator and Discriminator
generator_cnn = build_generator_cnn()
discriminator_cnn = build_discriminator_cnn()

# Compile the Discriminator
discriminator_cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0002), loss='binary_crossentropy', metrics=['accuracy'])

# Set the Discriminator's weights to non-trainable (important when we train the combined GAN model)
discriminator_cnn.trainable = False

# Combined GAN model with CNN
gan_input = layers.Input(shape=(100,))
gan_output = discriminator_cnn(generator_cnn(gan_input))
gan_cnn = models.Model(gan_input, gan_output)
gan_cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0002), loss='binary_crossentropy')

import matplotlib.pyplot as plt
import numpy as np

# Load and preprocess the MNIST dataset
(x_train, _), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape((-1, 28, 28, 1)).astype('float32') / 255.0

epochs = 10000
batch_size = 64

for epoch in range(epochs):
    ############################
    # 1. Train the Discriminator
    ############################
    
    # Generate batch of noise
    noise = np.random.normal(0, 1, (batch_size, 100))
    generated_images = generator_cnn.predict(noise)

    # Get a random batch of real images
    idx = np.random.randint(0, x_train.shape[0], batch_size)
    real_images = x_train[idx]

    # Labels for generated and real data
    fake_labels = np.zeros((batch_size, 1))
    real_labels = np.ones((batch_size, 1))

    # Train the Discriminator (real classified as ones and generated as zeros)
    d_loss_real = discriminator_cnn.train_on_batch(real_images, real_labels)
    d_loss_fake = discriminator_cnn.train_on_batch(generated_images, fake_labels)

    #################################
    # 2. Train the Generator (via GAN)
    #################################
    
    # Train the generator (note that we want the Discriminator to mistake images as real)
    noise = np.random.normal(0, 1, (batch_size, 100))
    valid_labels = np.ones((batch_size, 1))
    g_loss = gan_cnn.train_on_batch(noise, valid_labels)

    # Plot the progress
    if epoch % 100 == 0:
        print(f"Epoch {epoch}: D Loss Real: {d_loss_real[0]}, D Loss Fake: {d_loss_fake[0]}, G Loss: {g_loss}")
    
    # Optionally, save generated images and display
    if epoch % 1000 == 0:
        generated_image = generator_cnn.predict(np.random.normal(0, 1, (1, 100)))
        plt.imshow(generated_image[0, :, :, 0], cmap='gray')
        plt.axis('off')
        plt.show()
        plt.close()

Output:

Generated Images

Challenges and Considerations

Training Generative Adversarial Networks (GANs) presents several challenges, including:

Mode Collapse: Occurs when the Generator produces limited varieties of outputs, failing to explore the full diversity of the data distribution.
Training Instability: Manifests as oscillations or divergence in training, where the Generator and Discriminator struggle to reach equilibrium.
Hyperparameter Sensitivity: Parameters such as learning rates and network architectures significantly impact GANs’ performance and stability.

Conclusion

Generative Adversarial Networks have redefined image generation capabilities, offering powerful tools for creating diverse and realistic visual content. Despite challenges, ongoing research and advancements promise even greater applications and innovations in the field.

yashdkadam

Improve

Google Bard Introduces Free AI Image Generator - A Game Changer in Image Creation