Generative Adversarial Networks (GANs) represent a revolutionary approach to, artificial intelligence, particularly for generating images. Introduced in 2014, GANs have significantly advanced the ability to create realistic and high-quality images from random noise.
In this article, we are going to train GANs model on MNIST dataset for generating images.
Training GANs for Image Generation
Generative Adversarial Networks (GANs) employ two neural networks, the Generator, and the Discriminator, in a competitive framework where the Generator synthesizes images from random noise, striving to produce outputs indistinguishable from real data.
Training Generative Adversarial Networks (GANs) is an iterative process that revolves around the interaction between two neural networks:
Training the Discriminator
The Discriminator starts by being trained on a dataset containing real images. Its goal is to differentiate between these real images and fake images generated by the Generator. Through backpropagation and gradient descent, the Discriminator adjusts its parameters to improve its ability to accurately classify real and generated images.
Training the Generator
Concurrently, the Generator is trained to produce images that are increasingly difficult for the Discriminator to distinguish from real images. Initially, the Generator generates random noise, but as training progresses, it learns to generate images that resemble those in the training dataset. The Generator’s parameters are adjusted based on the feedback from the Discriminator, optimizing the Generator’s ability to create more realistic and high-quality images.
Implementing Generative Adversarial Networks (GANs) for Image Generation
Step 1: Import Necessary Libraries and Load Dataset
Import necessary libraries including TensorFlow, Keras layers and models, NumPy for numerical operations, and Matplotlib for plotting.
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers
import matplotlib.pyplot as plt
Proper data preparation is crucial for the successful training of neural networks. For the MNIST dataset, the preprocessing steps include loading the dataset, reshaping the images to ensure they are in the correct format for TensorFlow processing, and normalizing the pixel values to the range [0,1]. Normalization helps stabilize the training process by keeping the input values small.
# Step 1: Dataset Preparation
# Assuming you have a dataset of images (e.g., MNIST), load and preprocess them
(x_train, _), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape((-1, 28, 28, 1)).astype('float32') / 255.0
Step 2: Building the Models
This step involves defining the architecture for both the generator and the discriminator using convolutional neural network (CNN) layers, tailored to efficiently process and generate image data.
Generator Model with CNN Layers
The generator’s role in a GAN is to synthesize new images that mimic the distribution of a given dataset. In this case, we use convolutional transpose layers, which are effective for upscaling the input and creating detailed images from a lower-dimensional noise vector.
- Dense Layer: Converts the input 100-dimensional noise vector into a high-dimensional feature map.
- Reshape: Transforms the feature map into a 3D shape that can be processed by convolutional layers.
- Conv2DTranspose Layers: These layers perform upscaling and convolution simultaneously, gradually increasing the resolution of the generated image.
- BatchNormalization: Stabilizes the learning process and helps in faster convergence.
- Activation Functions: ‘ReLU’ is used for non-linearity in intermediate layers, while ‘sigmoid’ is used in the output layer to normalize the pixel values between 0 and 1.
def build_generator_cnn():
model = models.Sequential([
# Start with a fully connected layer to convert the input noise vector into a suitable shape
layers.Dense(7*7*128, input_dim=100, activation='relu'),
layers.Reshape((7, 7, 128)), # Reshape into an initial image format
# First upsampling and convolution to increase image size to 14x14
layers.Conv2DTranspose(128, kernel_size=4, strides=2, padding='same', activation='relu'),
layers.BatchNormalization(),
# Second upsampling and convolution to increase image size to 28x28
layers.Conv2DTranspose(128, kernel_size=4, strides=2, padding='same', activation='relu'),
layers.BatchNormalization(),
# Final convolution to produce a 28x28 image with 1 output channel (grayscale)
layers.Conv2D(1, kernel_size=7, activation='sigmoid', padding='same')
])
return model
Discriminator Model with CNN Layers
The discriminator is a binary classifier that determines whether a given image is real (from the dataset) or fake (generated by the generator).
- Conv2D Layers: Perform convolutions with a stride of 2 to downsample the image, reducing its dimensionality and increasing the field of view of the filters.
- BatchNormalization: Used here as well to ensure stable training.
- Flatten: Converts the 2D feature maps into a 1D feature vector necessary for classification.
- Dense Output Layer: Outputs a single probability indicating the likelihood that the input image is real.
def build_discriminator_cnn():
model = models.Sequential([
# Input layer, starting convolution
layers.Conv2D(64, kernel_size=3, strides=2, input_shape=(28, 28, 1), padding='same', activation='relu'),
# Second convolution to further downsample
layers.Conv2D(128, kernel_size=3, strides=2, padding='same', activation='relu'),
layers.BatchNormalization(),
# Flatten the convolution output to connect to a dense output layer
layers.Flatten(),
layers.Dense(1, activation='sigmoid')
])
return model
Step 3: Compiling the Models
First, compile and set up the combined GAN model, which connects the generator and discriminator. This setup is crucial for training the generator while keeping the discriminator’s parameters fixed during the generator’s training updates.
# Instantiate the generator and discriminator models
generator_cnn = build_generator_cnn()
discriminator_cnn = build_discriminator_cnn()
# Compile the Discriminator
discriminator_cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0002),
loss='binary_crossentropy',
metrics=['accuracy'])
# Set the discriminator to non-trainable when we are training the generator
discriminator_cnn.trainable = False
# Combined model
gan_input = layers.Input(shape=(100,))
gan_output = discriminator_cnn(generator_cnn(gan_input))
gan_cnn = models.Model(gan_input, gan_output)
gan_cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0002),
loss='binary_crossentropy')
Step 4: Model Training and Visualizing
The training loop involves alternately training the discriminator and the generator. The discriminator learns to distinguish real images from the fake ones produced by the generator. Simultaneously, the generator learns to fool the discriminator by generating increasingly realistic images.
epochs = 10000
batch_size = 64
for epoch in range(epochs):
# Random noise for generator
noise = np.random.normal(0, 1, (batch_size, 100))
generated_images = generator_cnn.predict(noise)
# Get a random batch of real images
idx = np.random.randint(0, x_train.shape[0], batch_size)
real_images = x_train[idx]
# Labels for real and generated data
real_labels = np.ones((batch_size, 1))
fake_labels = np.zeros((batch_size, 1))
# Train the Discriminator
d_loss_real = discriminator_cnn.train_on_batch(real_images, real_labels)
d_loss_fake = discriminator_cnn.train_on_batch(generated_images, fake_labels)
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
# Train the Generator
noise = np.random.normal(0, 1, (batch_size, 100))
valid_labels = np.ones((batch_size, 1))
g_loss = gan_cnn.train_on_batch(noise, valid_labels)
# Output training progress
if epoch % 100 == 0:
print(f"Epoch {epoch}: D Loss: {d_loss[0]}, G Loss: {g_loss}")
# Save and display generated images at intervals
if epoch % 1000 == 0:
test_noise = np.random.normal(0, 1, (1, 100))
test_img = generator_cnn.predict(test_noise)[0].reshape(28, 28)
plt.imshow(test_img, cmap='gray')
plt.axis('off')
plt.show()
Complete Code to Generate Images using GANs
Python
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers
import matplotlib.pyplot as plt
# Step 1: Dataset Preparation
# Assuming you have a dataset of images (e.g., MNIST), load and preprocess them
(x_train, _), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape((-1, 28, 28, 1)).astype('float32') / 255.0
def build_generator_cnn():
model = models.Sequential([
# Start with a fully connected layer to interpret the seed
layers.Dense(7*7*128, input_dim=100, activation='relu'),
layers.Reshape((7, 7, 128)), # Reshape into an image format
# Upsample to 14x14
layers.Conv2DTranspose(128, kernel_size=4, strides=2, padding='same', activation='relu'),
layers.BatchNormalization(),
# Upsample to 28x28
layers.Conv2DTranspose(128, kernel_size=4, strides=2, padding='same', activation='relu'),
layers.BatchNormalization(),
# Output layer with the shape of the target image, 1 channel for grayscale
layers.Conv2D(1, kernel_size=7, activation='sigmoid', padding='same')
])
return model
def build_discriminator_cnn():
model = models.Sequential([
# Input layer with the shape of the target image
layers.Conv2D(64, kernel_size=3, strides=2, input_shape=(28, 28, 1), padding='same', activation='relu'),
# Downsample to 14x14
layers.Conv2D(128, kernel_size=3, strides=2, padding='same', activation='relu'),
layers.BatchNormalization(),
# Further downsampling and flattening to feed into a dense output layer
layers.Flatten(),
layers.Dense(1, activation='sigmoid')
])
return model
# Instantiate the CNN-based Generator and Discriminator
generator_cnn = build_generator_cnn()
discriminator_cnn = build_discriminator_cnn()
# Compile the Discriminator
discriminator_cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0002), loss='binary_crossentropy', metrics=['accuracy'])
# Set the Discriminator's weights to non-trainable (important when we train the combined GAN model)
discriminator_cnn.trainable = False
# Combined GAN model with CNN
gan_input = layers.Input(shape=(100,))
gan_output = discriminator_cnn(generator_cnn(gan_input))
gan_cnn = models.Model(gan_input, gan_output)
gan_cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0002), loss='binary_crossentropy')
import matplotlib.pyplot as plt
import numpy as np
# Load and preprocess the MNIST dataset
(x_train, _), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape((-1, 28, 28, 1)).astype('float32') / 255.0
epochs = 10000
batch_size = 64
for epoch in range(epochs):
############################
# 1. Train the Discriminator
############################
# Generate batch of noise
noise = np.random.normal(0, 1, (batch_size, 100))
generated_images = generator_cnn.predict(noise)
# Get a random batch of real images
idx = np.random.randint(0, x_train.shape[0], batch_size)
real_images = x_train[idx]
# Labels for generated and real data
fake_labels = np.zeros((batch_size, 1))
real_labels = np.ones((batch_size, 1))
# Train the Discriminator (real classified as ones and generated as zeros)
d_loss_real = discriminator_cnn.train_on_batch(real_images, real_labels)
d_loss_fake = discriminator_cnn.train_on_batch(generated_images, fake_labels)
#################################
# 2. Train the Generator (via GAN)
#################################
# Train the generator (note that we want the Discriminator to mistake images as real)
noise = np.random.normal(0, 1, (batch_size, 100))
valid_labels = np.ones((batch_size, 1))
g_loss = gan_cnn.train_on_batch(noise, valid_labels)
# Plot the progress
if epoch % 100 == 0:
print(f"Epoch {epoch}: D Loss Real: {d_loss_real[0]}, D Loss Fake: {d_loss_fake[0]}, G Loss: {g_loss}")
# Optionally, save generated images and display
if epoch % 1000 == 0:
generated_image = generator_cnn.predict(np.random.normal(0, 1, (1, 100)))
plt.imshow(generated_image[0, :, :, 0], cmap='gray')
plt.axis('off')
plt.show()
plt.close()
Output:
Generated Images
Challenges and Considerations
Training Generative Adversarial Networks (GANs) presents several challenges, including:
- Mode Collapse: Occurs when the Generator produces limited varieties of outputs, failing to explore the full diversity of the data distribution.
- Training Instability: Manifests as oscillations or divergence in training, where the Generator and Discriminator struggle to reach equilibrium.
- Hyperparameter Sensitivity: Parameters such as learning rates and network architectures significantly impact GANs’ performance and stability.
Conclusion
Generative Adversarial Networks have redefined image generation capabilities, offering powerful tools for creating diverse and realistic visual content. Despite challenges, ongoing research and advancements promise even greater applications and innovations in the field.
Please Login to comment...