Using Autoencoder to generate digits with Keras
From the last article Understanding Autoencoders - An Unsupervised Learning approach, you must now have a good idea about Autoencoders, where are they used, and how to train one.
You must be excited to build your own Autoencoder that can generate stuff. Hence, in this article, we will be focusing on loading our dataset, building the encoder model, building the decoder model, and finally testing its performance by visualizing its output.
We will be using the tf.keras library for this project. The dataset to be used will be MNIST data which contains handwritten digits from 0 to 9. It contains a total of 60000 images along with a test set of 10000 grayscale images of the dimension 28 x 28.
Let's start our code -
Note: The code is written and tested by the author. The output images are the screenshots of jupyter notebook cells.
1. Import the libraries required for the entire project
from tensorflow.keras.layers import Dense, Input from tensorflow.keras.layers import Conv2D, Flatten from tensorflow.keras.layers import Reshape, Conv2DTranspose from tensorflow.keras.models import Model # You can directly import inbuilt MNIST dataset from tensorflow.keras.datasets from tensorflow.keras.datasets import mnist from tensorflow.keras import backend as K import numpy as np import matplotlib.pyplot as plt
2. Load the MNIST data with the load_data() function from Keras.
(x_train, _), (x_test, _) = mnist.load_data()
3. Reshape and Normalize the data for minimizing computation
# reshape to (28, 28, 1) and normalize input images image_size = x_train.shape x_train = np.reshape(x_train, [-1, image_size, image_size, 1]) x_test = np.reshape(x_test, [-1, image_size, image_size, 1]) x_train = x_train.astype('float32') / 255 x_test = x_test.astype('float32') / 255
4. Initialize the network parameters
input_shape = (image_size, image_size, 1) batch_size = 16 kernel_size = 3 latent_dim = 16 # encoder/decoder number of CNN layers and filters per layer layer_filters = [32, 64]
5. Building the encoder model
If you want to know more about the activation used here please have a look at this article - Activation Functions for Neural Networks
inputs = Input(shape=input_shape, name='encoder_input') x = inputs # Encoder model of conv2d(32) and conv2d(64) stacked together. # where 32, and 64 are number of filters for filters in layer_filters: x = Conv2D(filters=filters, kernel_size=kernel_size, activation='relu', strides=2, padding='same')(x)
Note: You may get some warnings like - "Instructions for Updating - Call initializer instance...." You can ignore and move ahead with the code.
6. Initialize the correct shape for the decoder model
""" This step is important because we want to pass a specific shape to our decoder. Implementing this step would ensure we don't do the layer wise calculation manually """ shape = K.int_shape(x)
The shape that would be passed through the first layer of the decoder (which is Conv2DTranspose) is (7, 7, 64)
7. Initialize the Latent space
x = Flatten()(x) latent = Dense(latent_dim, name='latent_vector')(x)
8. Initialize the encoder model
# latent is the output shape which we flattened earlier in previous step encoder = Model(inputs, latent, name='encoder') encoder.summary()
Here is how the summary of the encoder looks like -
10. Build the decoder model
# latent_dim is a parameter which we defined initially in step 4 latent_inputs = Input(shape=(latent_dim,), name='decoder_input') # use the shape (7, 7, 64) that was earlier saved x = Dense(shape * shape * shape)(latent_inputs) # from vector to suitable shape for transposed conv x = Reshape((shape, shape, shape))(x) # stack of Conv2DTranspose(64) and Conv2DTranspose(32) for filters in layer_filters[::-1]: x = Conv2DTranspose(filters=filters, kernel_size=kernel_size, activation='relu', strides=2, padding='same')(x)
11. Initializing the decoder output
outputs = Conv2DTranspose(filters=1, kernel_size=kernel_size, activation='sigmoid', padding='same', name='decoder_output')(x) # instantiate decoder model decoder = Model(latent_inputs, outputs, name='decoder') decoder.summary()
Here is how the summary of the decoder looks like -
12. Initialize the autoencoder as a whole
# autoencoder = encoder + decoder autoencoder = Model(inputs, decoder(encoder(inputs)), name='autoencoder') autoencoder.summary()
Here is how the summary of autoencoder looks like -
13. Train the autoencoder
If you would like to know more about optimizers and loss function in autoencoder please have a look at these articles, respectively -
# Mean Square Error (MSE) loss function, Adam optimizer autoencoder.compile(loss='mse', optimizer='adam') # train the autoencoder autoencoder.fit(x_train, x_train, validation_data=(x_test, x_test), epochs=1, batch_size=batch_size)
14. Get the predictions
You can use the predict() function from the Model() class in tensorflow.keras.models.
x_decoded = autoencoder.predict(x_test)
Note: The argument to be passed to the predict function should be a test dataset because if train samples are passed the autoencoder would generate the exact same result. This will mean that the autoencoder is simply copying the data and pasting it in the decoder output.
15. Finally visualizing the results
imgs = np.concatenate([x_test[:8], x_decoded[:8]]) imgs = imgs.reshape((4, 4, image_size, image_size)) imgs = np.vstack([np.hstack(i) for i in imgs]) plt.figure() plt.axis('off') plt.title('Input: 1st 2 rows, Decoded: last 2 rows') plt.imshow(imgs, interpolation='none', cmap='gray') plt.show()
Here we are creating an image that will contain a total of 16 images stacked together. First, two rows are input test set and the remaining are their corresponding generations.
From the above-visualized image, you can clearly see that the generated images are a little blur. That means we could have a deeper encoder and decoder model to extract further features. You can also observe that the digit '9' is the most blurred prediction in the given set which means the autoencoder is not trained as well as the rest for the class '9'.
The loss incurred after the training of the autoencoder is 0.0173 and the validation loss is 0.0097. We can make our autoencoder better by stacking more convolution and transpose convolution layers in the encoder and decoder respectively. Furthermore, playing with the network parameters can also have a good impact on the performance.
That's it for this article. Hope now you can develop your own autoencoder and I would suggest you can also try with a different dataset, for example, Fashion MNIST.
Thank you and Cheers :)