Autoencoders

Autoencoder is an unsupervised model - a deep neural network architecture - which contains an encoder and decoder. The encoder component serves as compressing the input to a lower-dimensional representation while the decoder aims to reconstruct the compressed representation back to the original input. 

The architecture is pretty simple, with the number of neurons in the layers of the encoder part (blue below) decreases, and then starts increasing again in the decoder part (purple below).

Input image => Dense(256) => Dense(64) => Dense(2) => Dense(64) => Dense(246) => Output (reconstructed image)

As one might expect, the loss is between the input image/data and reconstructed one, as the part of the name auto (self-supervised) implies.

In this post, we go through the implementation of Autoencoder with Tensorflow and Keras. The example below is from Probabilistic Deep Learning with TensorFlow 2 course from Coursera, which by the way, I am highly recommend if you want to get familiar with Tensorflow Probability module. However, for Autoencoder, we don't necessarily need the Tensorflow Probability module (The module is useful when implementing Variational AutoEncoder, a generative variant of Autoencoder). 

Contents

  • Import required packages
  • Fashion MNIST dataset
  • Encoder
  • Decoder
  • Encoding results after training
  • Autoencoder reconstructed results
  • Import required packages

    
    import tensorflow
    import matplotlib
    import seaborn as sns
    import numpy as np
    import matplotlib.pyplot as plt
    
    from tensorflow.keras.models import Sequential, Model
    from tensorflow.keras.layers import Dense, Flatten, Reshape
    
    print(tensorflow.__version__)
    print(matplotlib.__version__)
    print(np.__version__)
    print(sns.__version__)
    print(matplotlib.__version__)
    
    
    2.1.0
    3.0.3
    1.18.3
    0.9.0
    3.0.3

    Fashion MNIST dataset

    Fashion MNIST dataset is from Zalando - a publicly traded German online retailer of shoes, fashion and beauty active across Europe. The dataset consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We don't use those labels but only use images as we want to use Autoencoder to compress and reconstruct a given image. Let's get started.
    
    # Load Fashion MNIST
    
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
    x_train = x_train.astype('float32')/255.
    x_test = x_test.astype('float32')/255.
    class_names = np.array([
        'T-shirt/top', 
        'Trouser/pants', 
        'Pullover shirt', 
        'Dress',
        'Coat', 
        'Sandal', 
        'Shirt', 
        'Sneaker', 
        'Bag',
        'Ankle boot'
    ])
    
    print(x_train.shape)
    
    
    (60000, 28, 28)

    We can have a look on some of those images.

    
    # Display a few examples
    
    n_examples = 1000
    example_images = x_test[0:n_examples]
    example_labels = y_test[0:n_examples]
    
    f, axs = plt.subplots(1, 5, figsize=(15, 4))
    for j in range(len(axs)):
        axs[j].imshow(example_images[j], cmap='binary')
        axs[j].axis('off')
    
    


    Encoder

    Now we move on to the implementation of the encoder part of Autoencoder. The encoder simply flattens the input image and goes through two Dense layers followed by another Dense layer with desired encoding dimensionality, which is 2 here.

    We can check the compressed or encoded images using this encoder. Note as the encoder has not been trained yet, we should see those encoded images from different class are not distinguishable in the encoding space.

    
    # Define the encoder
    
    encoded_dim = 2
    encoder = Sequential([
        Flatten(input_shape=(28, 28)),
        Dense(256, activation='sigmoid'),
        Dense(64, activation='sigmoid'),
        Dense(encoded_dim)
    ])
    
    # Encode examples before training
    
    pretrain_example_encodings = encoder(example_images).numpy()
    
    # Plot encoded examples before training 
    
    f, ax = plt.subplots(1, 1, figsize=(7, 7))
    sns.scatterplot(pretrain_example_encodings[:, 0],
                    pretrain_example_encodings[:, 1],
                    hue=class_names[example_labels], ax=ax,
                    palette=sns.color_palette("colorblind", 10));
    ax.set_xlabel('Encoding dimension 1'); ax.set_ylabel('Encoding dimension 2')
    ax.set_title('Encodings of example images before training');
    



    Decoder

    Given the 2-dim encoded images, the decoder part tries to reconstruct the input image. And we can use the encoder and deconder that we've just defined to define the Autoencoder. Afterwards, we compile and fit the model as we usually do for training the Autoencoder.
    
    # Define the decoder
    
    decoder = Sequential([
        Dense(64, activation='sigmoid', input_shape=(encoded_dim,)),
        Dense(256, activation='sigmoid'),
        Dense(28*28, activation='sigmoid'),
        Reshape((28, 28))
    ])
    
    # Compile and fit the model
    
    autoencoder = Model(
        inputs=encoder.input,
        outputs=decoder(encoder.output)
    )
    
    # Specify loss - input and output is in [0., 1.], so we can use a binary cross-entropy loss
    autoencoder.compile(loss='binary_crossentropy')
    
    # Fit model - highlight that labels and input are the same
    autoencoder.fit(
        x=x_train, 
        y=x_train,
        epochs=10,
        batch_size=32
    )
    
    Train on 60000 samples
    Epoch 1/10 60000/60000 [==============================] - 76s 1ms/sample - loss: 0.4078
    Epoch 2/10 60000/60000 [==============================] - 74s 1ms/sample - loss: 0.3510
    Epoch 3/10 60000/60000 [==============================] - 75s 1ms/sample - loss: 0.3395
    Epoch 4/10 60000/60000 [==============================] - 78s 1ms/sample - loss: 0.3342
    Epoch 5/10 60000/60000 [==============================] - 78s 1ms/sample - loss: 0.3308
    Epoch 6/10 60000/60000 [==============================] - 78s 1ms/sample - loss: 0.3284
    Epoch 7/10 60000/60000 [==============================] - 77s 1ms/sample - loss: 0.3264
    Epoch 8/10 60000/60000 [==============================] - 74s 1ms/sample - loss: 0.3248
    Epoch 9/10 60000/60000 [==============================] - 70s 1ms/sample - loss: 0.3234
    Epoch 10/10 60000/60000 [==============================] - 84s 1ms/sample - loss: 0.3226

    Encoding results after training

    Now the Autoencoder has been trained. We can again check the encoded/compressed images to see if those encoded/compressed representations of images exhibit some interesting patterns ideally according to their categories.
    
    # Compute example encodings after training
    
    posttrain_example_encodings = encoder(example_images).numpy()
    
    # Compare the example encodings before and after training
    
    f, axs = plt.subplots(nrows=1, ncols=2, figsize=(15, 7))
    sns.scatterplot(pretrain_example_encodings[:, 0],
                    pretrain_example_encodings[:, 1],
                    hue=class_names[example_labels], ax=axs[0],
                    palette=sns.color_palette("colorblind", 10));
    sns.scatterplot(posttrain_example_encodings[:, 0],
                    posttrain_example_encodings[:, 1],
                    hue=class_names[example_labels], ax=axs[1],
                    palette=sns.color_palette("colorblind", 10));
    
    axs[0].set_title('Encodings of example images before training');
    axs[1].set_title('Encodings of example images after training');
    
    for ax in axs: 
        ax.set_xlabel('Encoding dimension 1')
        ax.set_ylabel('Encoding dimension 2')
        ax.legend(loc='upper right')
    



    As we can see from the figure, after training, images belong to the same or similar categories such as "Ankle boot" and "Sneaker" tend to be clustered together.

    Autoencoder reconstructed results

    Here we can reconstruct some images using the trained Autoencoder, which shows the reconstructed images are reasonably close to the given images.
    
    # Compute the autoencoder's reconstructions
    
    reconstructed_example_images = autoencoder(example_images)
    
    # Evaluate the autoencoder's reconstructions
    
    f, axs = plt.subplots(2, 5, figsize=(15, 4))
    for j in range(5):
        axs[0, j].imshow(example_images[j], cmap='binary')
        axs[1, j].imshow(reconstructed_example_images[j].numpy().squeeze(), cmap='binary')
        axs[0, j].axis('off')
        axs[1, j].axis('off')
    



    In this post, we introduced Autoencoder, which trains the encoder and decoder parts via "self-supervised" way by minimizing the reconstruction loss. Although Autoencoder can be useful for compression and reconstruction, it is not designed or trained to generate images. VAE (Variational Autoencoder) is the probablistic twist of Autoencoder for that purpose, which we will look into in another post.