Using Autoencoder to generate digits with Keras

This article contains a real-time implementation of an autoencoder which we will train and evaluate using very known public benchmark dataset called MNIST data.
TKTejas Khare24.00
May 14, 2021
Article

From the last article Understanding Autoencoders - An Unsupervised Learning approach, you must now have a good idea about Autoencoders, where are they used, and how to train one.

You must be excited to build your own Autoencoder that can generate stuff. Hence, in this article, we will be focusing on loading our dataset, building the encoder model, building the decoder model, and finally testing its performance by visualizing its output.

We will be using the tf.keras library for this project. The dataset to be used will be MNIST data which contains handwritten digits from 0 to 9. It contains a total of 60000 images along with a  test set of 10000 grayscale images of the dimension 28 x 28.

Let's start our code -

Note: The code is written and tested by the author. The output images are the screenshots of jupyter notebook cells.

1. Import the libraries required for the entire project

from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.layers import Conv2D, Flatten
from tensorflow.keras.layers import Reshape, Conv2DTranspose
from tensorflow.keras.models import Model
# You can directly import inbuilt MNIST dataset from  tensorflow.keras.datasets
from tensorflow.keras.datasets import mnist
from tensorflow.keras import backend as K
import numpy as np
import matplotlib.pyplot as plt

2.  Load the MNIST data with the load_data() function from Keras.

(x_train, _), (x_test, _) = mnist.load_data()

3. Reshape and Normalize the data for minimizing computation

# reshape to (28, 28, 1) and normalize input images
image_size = x_train.shape[1]
x_train = np.reshape(x_train, [-1, image_size, image_size, 1])
x_test = np.reshape(x_test, [-1, image_size, image_size, 1])
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

4. Initialize the network parameters

input_shape = (image_size, image_size, 1)
batch_size = 16
kernel_size = 3
latent_dim = 16
# encoder/decoder number of CNN layers and filters per layer
layer_filters = [32, 64]

5. Building the encoder model

If you want to know more about the activation used here please have a look at this article - Activation Functions for Neural Networks

inputs = Input(shape=input_shape, name='encoder_input')
x = inputs

# Encoder model of conv2d(32) and conv2d(64) stacked together.
# where 32, and 64 are number of filters
for filters in layer_filters:    
    x = Conv2D(filters=filters,
               kernel_size=kernel_size,
               activation='relu',
               strides=2,
               padding='same')(x)

Note: You may get some warnings like - "Instructions for Updating - Call initializer instance...." You can ignore and move ahead with the code.

6. Initialize the correct shape for the decoder model

"""
This step is important because we want to pass 
a specific shape to our decoder. 
Implementing this step would ensure we don't do the layer wise
calculation manually 
""" 

shape = K.int_shape(x)

The shape that would be passed through the first layer of the decoder (which is Conv2DTranspose) is (7, 7, 64)

7. Initialize the Latent space

x = Flatten()(x)
latent = Dense(latent_dim, name='latent_vector')(x)

8.  Initialize the encoder model

# latent is the output shape which we flattened earlier in previous step 
encoder = Model(inputs,
                latent,
                name='encoder')
encoder.summary()

Here is how the summary of the encoder looks like -

10. Build the decoder model

# latent_dim is a parameter which we defined initially in step 4
latent_inputs = Input(shape=(latent_dim,), name='decoder_input')

# use the shape (7, 7, 64) that was earlier saved
x = Dense(shape[1] * shape[2] * shape[3])(latent_inputs)

# from vector to suitable shape for transposed conv
x = Reshape((shape[1], shape[2], shape[3]))(x)

# stack of Conv2DTranspose(64) and Conv2DTranspose(32)
for filters in layer_filters[::-1]:    
    x = Conv2DTranspose(filters=filters,
                        kernel_size=kernel_size,
                        activation='relu',
                        strides=2,
                        padding='same')(x)

11. Initializing the decoder output

outputs = Conv2DTranspose(filters=1,
                          kernel_size=kernel_size,
                          activation='sigmoid',
                          padding='same',
                          name='decoder_output')(x)

# instantiate decoder model
decoder = Model(latent_inputs, outputs, name='decoder')
decoder.summary()

Here is how the summary of the decoder looks like -

12. Initialize the autoencoder as a whole

# autoencoder = encoder + decoder
autoencoder = Model(inputs,
                    decoder(encoder(inputs)), 
                    name='autoencoder')
autoencoder.summary()

Here is how the summary of autoencoder looks like -

13. Train the autoencoder

If you would like to know more about optimizers and loss function in autoencoder please have a look at these articles, respectively -

  1. Optimization Methods
  2. Understanding Autoencoders - An Unsupervised Learning approach
# Mean Square Error (MSE) loss function, Adam optimizer
autoencoder.compile(loss='mse', optimizer='adam')
# train the autoencoder
autoencoder.fit(x_train,
                x_train,
                validation_data=(x_test, x_test),     
                epochs=1,               
                batch_size=batch_size)

14. Get the predictions

You can use the predict() function from the Model() class in tensorflow.keras.models.

x_decoded = autoencoder.predict(x_test)

Note: The argument to be passed to the predict function should be a test dataset because if train samples are passed the autoencoder would generate the exact same result. This will mean that the autoencoder is simply copying the data and pasting it in the decoder output.

15. Finally visualizing the results

imgs = np.concatenate([x_test[:8], x_decoded[:8]])
imgs = imgs.reshape((4, 4, image_size, image_size))
imgs = np.vstack([np.hstack(i) for i in imgs])
plt.figure()
plt.axis('off')
plt.title('Input: 1st 2 rows, Decoded: last 2 rows')
plt.imshow(imgs, interpolation='none', cmap='gray')
plt.show()

Here we are creating an image that will contain a total of 16 images stacked together. First, two rows are input test set and the remaining are their corresponding generations.

16. Observations

From the above-visualized image, you can clearly see that the generated images are a little blur. That means we could have a deeper encoder and decoder model to extract further features. You can also observe that the digit '9' is the most blurred prediction in the given set which means the autoencoder is not trained as well as the rest for the class '9'.

17. Conclusion 

The loss incurred after the training of the autoencoder is 0.0173 and the validation loss is 0.0097. We can make our autoencoder better by stacking more convolution and transpose convolution layers in the encoder and decoder respectively. Furthermore, playing with the network parameters can also have a good impact on the performance.

That's it for this article. Hope now you can develop your own autoencoder and I would suggest you can also try with a different dataset, for example, Fashion MNIST.

Thank you and Cheers :)

2 votes
kerasmnistautoencoder
How helpful was this page?