The architecture is pretty simple, with the number of neurons in the layers of the encoder part (blue below) decreases, and then starts increasing again in the decoder part (purple below).
Input image => Dense(256) => Dense(64) => Dense(2) => Dense(64) => Dense(246) => Output (reconstructed image)
As one might expect, the loss is between the input image/data and reconstructed one, as the part of the name auto (self-supervised) implies.
In this post, we go through the implementation of Autoencoder with Tensorflow and Keras. The example below is from Probabilistic Deep Learning with TensorFlow 2 course from Coursera, which by the way, I am highly recommend if you want to get familiar with Tensorflow Probability module. However, for Autoencoder, we don't necessarily need the Tensorflow Probability module (The module is useful when implementing Variational AutoEncoder, a generative variant of Autoencoder).
Import required packages
Fashion MNIST dataset
Encoder
Decoder
Encoding results after training
Autoencoder reconstructed results
3.0.3
1.18.3
0.9.0
3.0.3
Epoch 1/10 60000/60000 [==============================] - 76s 1ms/sample - loss: 0.4078
Epoch 2/10 60000/60000 [==============================] - 74s 1ms/sample - loss: 0.3510
Epoch 3/10 60000/60000 [==============================] - 75s 1ms/sample - loss: 0.3395
Epoch 4/10 60000/60000 [==============================] - 78s 1ms/sample - loss: 0.3342
Epoch 5/10 60000/60000 [==============================] - 78s 1ms/sample - loss: 0.3308
Epoch 6/10 60000/60000 [==============================] - 78s 1ms/sample - loss: 0.3284
Epoch 7/10 60000/60000 [==============================] - 77s 1ms/sample - loss: 0.3264
Epoch 8/10 60000/60000 [==============================] - 74s 1ms/sample - loss: 0.3248
Epoch 9/10 60000/60000 [==============================] - 70s 1ms/sample - loss: 0.3234
Epoch 10/10 60000/60000 [==============================] - 84s 1ms/sample - loss: 0.3226
Contents
Import required packages
import tensorflow
import matplotlib
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Flatten, Reshape
print(tensorflow.__version__)
print(matplotlib.__version__)
print(np.__version__)
print(sns.__version__)
print(matplotlib.__version__)
2.1.03.0.3
1.18.3
0.9.0
3.0.3
Fashion MNIST dataset
Fashion MNIST dataset is from Zalando - a publicly traded German online retailer of shoes, fashion and beauty active across Europe. The dataset consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We don't use those labels but only use images as we want to use Autoencoder to compress and reconstruct a given image. Let's get started.
# Load Fashion MNIST
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
x_train = x_train.astype('float32')/255.
x_test = x_test.astype('float32')/255.
class_names = np.array([
'T-shirt/top',
'Trouser/pants',
'Pullover shirt',
'Dress',
'Coat',
'Sandal',
'Shirt',
'Sneaker',
'Bag',
'Ankle boot'
])
print(x_train.shape)
(60000, 28, 28)
We can have a look on some of those images.
# Display a few examples
n_examples = 1000
example_images = x_test[0:n_examples]
example_labels = y_test[0:n_examples]
f, axs = plt.subplots(1, 5, figsize=(15, 4))
for j in range(len(axs)):
axs[j].imshow(example_images[j], cmap='binary')
axs[j].axis('off')
Encoder
Now we move on to the implementation of the encoder part of Autoencoder. The encoder simply flattens the input image and goes through two Dense layers followed by another Dense layer with desired encoding dimensionality, which is 2 here.
We can check the compressed or encoded images using this encoder. Note as the encoder has not been trained yet, we should see those encoded images from different class are not distinguishable in the encoding space.
# Define the encoder
encoded_dim = 2
encoder = Sequential([
Flatten(input_shape=(28, 28)),
Dense(256, activation='sigmoid'),
Dense(64, activation='sigmoid'),
Dense(encoded_dim)
])
# Encode examples before training
pretrain_example_encodings = encoder(example_images).numpy()
# Plot encoded examples before training
f, ax = plt.subplots(1, 1, figsize=(7, 7))
sns.scatterplot(pretrain_example_encodings[:, 0],
pretrain_example_encodings[:, 1],
hue=class_names[example_labels], ax=ax,
palette=sns.color_palette("colorblind", 10));
ax.set_xlabel('Encoding dimension 1'); ax.set_ylabel('Encoding dimension 2')
ax.set_title('Encodings of example images before training');
Decoder
Given the 2-dim encoded images, the decoder part tries to reconstruct the input image. And we can use the encoder and deconder that we've just defined to define the Autoencoder. Afterwards, we compile and fit the model as we usually do for training the Autoencoder.
# Define the decoder
decoder = Sequential([
Dense(64, activation='sigmoid', input_shape=(encoded_dim,)),
Dense(256, activation='sigmoid'),
Dense(28*28, activation='sigmoid'),
Reshape((28, 28))
])
# Compile and fit the model
autoencoder = Model(
inputs=encoder.input,
outputs=decoder(encoder.output)
)
# Specify loss - input and output is in [0., 1.], so we can use a binary cross-entropy loss
autoencoder.compile(loss='binary_crossentropy')
# Fit model - highlight that labels and input are the same
autoencoder.fit(
x=x_train,
y=x_train,
epochs=10,
batch_size=32
)
Train on 60000 samplesEpoch 1/10 60000/60000 [==============================] - 76s 1ms/sample - loss: 0.4078
Epoch 2/10 60000/60000 [==============================] - 74s 1ms/sample - loss: 0.3510
Epoch 3/10 60000/60000 [==============================] - 75s 1ms/sample - loss: 0.3395
Epoch 4/10 60000/60000 [==============================] - 78s 1ms/sample - loss: 0.3342
Epoch 5/10 60000/60000 [==============================] - 78s 1ms/sample - loss: 0.3308
Epoch 6/10 60000/60000 [==============================] - 78s 1ms/sample - loss: 0.3284
Epoch 7/10 60000/60000 [==============================] - 77s 1ms/sample - loss: 0.3264
Epoch 8/10 60000/60000 [==============================] - 74s 1ms/sample - loss: 0.3248
Epoch 9/10 60000/60000 [==============================] - 70s 1ms/sample - loss: 0.3234
Epoch 10/10 60000/60000 [==============================] - 84s 1ms/sample - loss: 0.3226
Encoding results after training
Now the Autoencoder has been trained. We can again check the encoded/compressed images to see if those encoded/compressed representations of images exhibit some interesting patterns ideally according to their categories.
# Compute example encodings after training
posttrain_example_encodings = encoder(example_images).numpy()
# Compare the example encodings before and after training
f, axs = plt.subplots(nrows=1, ncols=2, figsize=(15, 7))
sns.scatterplot(pretrain_example_encodings[:, 0],
pretrain_example_encodings[:, 1],
hue=class_names[example_labels], ax=axs[0],
palette=sns.color_palette("colorblind", 10));
sns.scatterplot(posttrain_example_encodings[:, 0],
posttrain_example_encodings[:, 1],
hue=class_names[example_labels], ax=axs[1],
palette=sns.color_palette("colorblind", 10));
axs[0].set_title('Encodings of example images before training');
axs[1].set_title('Encodings of example images after training');
for ax in axs:
ax.set_xlabel('Encoding dimension 1')
ax.set_ylabel('Encoding dimension 2')
ax.legend(loc='upper right')
As we can see from the figure, after training, images belong to the same or similar categories such as "Ankle boot" and "Sneaker" tend to be clustered together.
Autoencoder reconstructed results
Here we can reconstruct some images using the trained Autoencoder, which shows the reconstructed images are reasonably close to the given images.
# Compute the autoencoder's reconstructions
reconstructed_example_images = autoencoder(example_images)
# Evaluate the autoencoder's reconstructions
f, axs = plt.subplots(2, 5, figsize=(15, 4))
for j in range(5):
axs[0, j].imshow(example_images[j], cmap='binary')
axs[1, j].imshow(reconstructed_example_images[j].numpy().squeeze(), cmap='binary')
axs[0, j].axis('off')
axs[1, j].axis('off')
In this post, we introduced Autoencoder, which trains the encoder and decoder parts via "self-supervised" way by minimizing the reconstruction loss. Although Autoencoder can be useful for compression and reconstruction, it is not designed or trained to generate images. VAE (Variational Autoencoder) is the probablistic twist of Autoencoder for that purpose, which we will look into in another post.