Autoencoder is an unsupervised model - a deep neural network architecture - which contains an encoder and decoder. The encoder component serves as compressing the input to a lower-dimensional representation while the decoder aims to reconstruct the compressed representation back to the original input.

The architecture is pretty simple, with the number of neurons in the layers of the encoder part (blue below) decreases, and then starts increasing again in the decoder part (purple below).

Input image => Dense(256) => Dense(64) => Dense(2) => Dense(64) => Dense(246) => Output (reconstructed image)

As one might expect, the loss is between the input image/data and reconstructed one, as the part of the name auto (self-supervised) implies.

In this post, we go through the implementation of Autoencoder with Tensorflow and Keras. The example below is from Probabilistic Deep Learning with TensorFlow 2 course from Coursera, which by the way, I am highly recommend if you want to get familiar with Tensorflow Probability module. However, for Autoencoder, we don't necessarily need the Tensorflow Probability module (The module is useful when implementing Variational AutoEncoder, a generative variant of Autoencoder).

Import required packages

Fashion MNIST dataset

Encoder

Decoder

Encoding results after training

Autoencoder reconstructed results

Import required packages


import tensorflow
import matplotlib
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Flatten, Reshape

print(tensorflow.__version__)
print(matplotlib.__version__)
print(np.__version__)
print(sns.__version__)
print(matplotlib.__version__)

2.1.0 3.0.3 1.18.3 0.9.0 3.0.3

Fashion MNIST dataset

Fashion MNIST dataset is from Zalando - a publicly traded German online retailer of shoes, fashion and beauty active across Europe. The dataset consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We don't use those labels but only use images as we want to use Autoencoder to compress and reconstruct a given image. Let's get started.


# Load Fashion MNIST

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
x_train = x_train.astype('float32')/255.
x_test = x_test.astype('float32')/255.
class_names = np.array([
    'T-shirt/top', 
    'Trouser/pants', 
    'Pullover shirt', 
    'Dress',
    'Coat', 
    'Sandal', 
    'Shirt', 
    'Sneaker', 
    'Bag',
    'Ankle boot'
])

print(x_train.shape)

(60000, 28, 28)

We can have a look on some of those images.


# Display a few examples

n_examples = 1000
example_images = x_test[0:n_examples]
example_labels = y_test[0:n_examples]

f, axs = plt.subplots(1, 5, figsize=(15, 4))
for j in range(len(axs)):
    axs[j].imshow(example_images[j], cmap='binary')
    axs[j].axis('off')

Encoder

Now we move on to the implementation of the encoder part of Autoencoder. The encoder simply flattens the input image and goes through two Dense layers followed by another Dense layer with desired encoding dimensionality, which is 2 here.

We can check the compressed or encoded images using this encoder. Note as the encoder has not been trained yet, we should see those encoded images from different class are not distinguishable in the encoding space.


# Define the encoder

encoded_dim = 2
encoder = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(256, activation='sigmoid'),
    Dense(64, activation='sigmoid'),
    Dense(encoded_dim)
])

# Encode examples before training

pretrain_example_encodings = encoder(example_images).numpy()

# Plot encoded examples before training 

f, ax = plt.subplots(1, 1, figsize=(7, 7))
sns.scatterplot(pretrain_example_encodings[:, 0],
                pretrain_example_encodings[:, 1],
                hue=class_names[example_labels], ax=ax,
                palette=sns.color_palette("colorblind", 10));
ax.set_xlabel('Encoding dimension 1'); ax.set_ylabel('Encoding dimension 2')
ax.set_title('Encodings of example images before training');

Decoder

Given the 2-dim encoded images, the decoder part tries to reconstruct the input image. And we can use the encoder and deconder that we've just defined to define the Autoencoder. Afterwards, we compile and fit the model as we usually do for training the Autoencoder.


# Define the decoder

decoder = Sequential([
    Dense(64, activation='sigmoid', input_shape=(encoded_dim,)),
    Dense(256, activation='sigmoid'),
    Dense(28*28, activation='sigmoid'),
    Reshape((28, 28))
])

# Compile and fit the model

autoencoder = Model(
    inputs=encoder.input,
    outputs=decoder(encoder.output)
)

# Specify loss - input and output is in [0., 1.], so we can use a binary cross-entropy loss
autoencoder.compile(loss='binary_crossentropy')

# Fit model - highlight that labels and input are the same
autoencoder.fit(
    x=x_train, 
    y=x_train,
    epochs=10,
    batch_size=32
)

Train on 60000 samples Epoch 1/10 60000/60000 [==============================] - 76s 1ms/sample - loss: 0.4078 Epoch 2/10 60000/60000 [==============================] - 74s 1ms/sample - loss: 0.3510 Epoch 3/10 60000/60000 [==============================] - 75s 1ms/sample - loss: 0.3395 Epoch 4/10 60000/60000 [==============================] - 78s 1ms/sample - loss: 0.3342 Epoch 5/10 60000/60000 [==============================] - 78s 1ms/sample - loss: 0.3308 Epoch 6/10 60000/60000 [==============================] - 78s 1ms/sample - loss: 0.3284 Epoch 7/10 60000/60000 [==============================] - 77s 1ms/sample - loss: 0.3264 Epoch 8/10 60000/60000 [==============================] - 74s 1ms/sample - loss: 0.3248 Epoch 9/10 60000/60000 [==============================] - 70s 1ms/sample - loss: 0.3234 Epoch 10/10 60000/60000 [==============================] - 84s 1ms/sample - loss: 0.3226

Encoding results after training

Now the Autoencoder has been trained. We can again check the encoded/compressed images to see if those encoded/compressed representations of images exhibit some interesting patterns ideally according to their categories.


# Compute example encodings after training

posttrain_example_encodings = encoder(example_images).numpy()

# Compare the example encodings before and after training

f, axs = plt.subplots(nrows=1, ncols=2, figsize=(15, 7))
sns.scatterplot(pretrain_example_encodings[:, 0],
                pretrain_example_encodings[:, 1],
                hue=class_names[example_labels], ax=axs[0],
                palette=sns.color_palette("colorblind", 10));
sns.scatterplot(posttrain_example_encodings[:, 0],
                posttrain_example_encodings[:, 1],
                hue=class_names[example_labels], ax=axs[1],
                palette=sns.color_palette("colorblind", 10));

axs[0].set_title('Encodings of example images before training');
axs[1].set_title('Encodings of example images after training');

for ax in axs: 
    ax.set_xlabel('Encoding dimension 1')
    ax.set_ylabel('Encoding dimension 2')
    ax.legend(loc='upper right')

As we can see from the figure, after training, images belong to the same or similar categories such as "Ankle boot" and "Sneaker" tend to be clustered together.

Autoencoder reconstructed results

Here we can reconstruct some images using the trained Autoencoder, which shows the reconstructed images are reasonably close to the given images.


# Compute the autoencoder's reconstructions

reconstructed_example_images = autoencoder(example_images)

# Evaluate the autoencoder's reconstructions

f, axs = plt.subplots(2, 5, figsize=(15, 4))
for j in range(5):
    axs[0, j].imshow(example_images[j], cmap='binary')
    axs[1, j].imshow(reconstructed_example_images[j].numpy().squeeze(), cmap='binary')
    axs[0, j].axis('off')
    axs[1, j].axis('off')

In this post, we introduced Autoencoder, which trains the encoder and decoder parts via "self-supervised" way by minimizing the reconstruction loss. Although Autoencoder can be useful for compression and reconstruction, it is not designed or trained to generate images. VAE (Variational Autoencoder) is the probablistic twist of Autoencoder for that purpose, which we will look into in another post.

Autoencoders

Contents

Import required packages

Fashion MNIST dataset

Encoder

Decoder

Encoding results after training

Autoencoder reconstructed results

No comments:

Post a Comment