Auto Encoder

An introduction of auto encoder.

Auto encoder is used for feature extraction and pretraining for deep neural network.

In this a tutorial, following 2 auto encoders are introduced. Also we pre-train a neural network using both 2 methods and compare them using the classification results.

  • Vanilla auto encoder
  • Denoising auto encoder

Auto encoder is an unsupervised feature learning algorithm. It has same unit size of input layer and output layer, and has some hidden layers. Giving same data as input and output, it learns to reconstruct the input data. In general case, hidden layers’ unit size is smaller than input layer. Because of that structure, auto encoder compress the information at hidden layers.

Once the training has been done, the weights of hidden layers are used as pretrained weight in other neural networks.

As another usage, the compressed data is usable in other machine learning algorithms.

Requirements

In this tutorial, following modules are required.

In [1]:
import numpy as np
np.seterr(all="ignore")

import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report

from renom import *
from renom.utility.initializer import Gaussian

Load Data

In this tutorial, we use MNIST dataset. You can download it from following website or use download method provided from sklearn.

Then divide it into train set and test set.

In [2]:
# Data path must point to the directory containing the data folder.
data_path = "../dataset"
mnist = fetch_mldata('MNIST original', data_home=data_path)

X = mnist.data
y = mnist.target

# Binalize the image data.
X = X.astype(np.float32)
y = y.astype(np.float32)
X = np.array(X > 128, dtype=np.float32)

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.1)

binarizer = LabelBinarizer()
label_train = binarizer.fit_transform(y_train)
label_test = binarizer.transform(y_test)

# Training data size.
N = len(x_train)

Vanilla Auto Encoder

This section, we build a vanilla auto encoder model using sequential model. As mentioned above, auto encoder has same unit size of input layer and output layer.

In this case, because of mnist data an image which size is 28x28, auto encoder’s input layer size and output size are 784(=28x28).

Model Definition

The model consists of 2 layers and relu activation function.

We use adam[1] as an gradient decent algorithm.

In [3]:
model_ae = Sequential([
        Dense(100),
        Relu(),
        Dense(784),
    ])
optimizer = Adam()

Train Loop

Train loop is bellow. We measure the cross entropy between raw digit data and reconstructed image.

After the train loop, we show the raw digit and reconstructed image. We can confirm that the reconstructed image has same shape as raw image.

In [4]:
batch = 64
epoch = 10

for i in range(epoch):
    for j in range(N//batch):
        train_batch=x_train[j*batch:(j+1)*batch]
        with model_ae.train():
            z = model_ae(train_batch)
            loss = sigmoid_cross_entropy(z, train_batch)
        loss.grad().update(optimizer)
    if i%2 == 0:print("epoch %02d train_loss:%f"%(i, loss))

# Show raw img and reconstructed img.
test_img = x_test[0]
fig, ax = plt.subplots(ncols=2, nrows=1, figsize=(8, 6))
ax[0].set_title("Raw image")
ax[0].imshow(test_img.reshape(28, 28), cmap="gray")
ax[1].set_title("Reconstructed image")
ax[1].imshow(sigmoid(model_ae(test_img)).reshape(28, 28), cmap="gray")
plt.show()
epoch 00 train_loss:53.192020
epoch 02 train_loss:21.783264
epoch 04 train_loss:15.329295
epoch 06 train_loss:12.729212
epoch 08 train_loss:11.432202
../../../_images/notebooks_basic_autoencoder_notebook_9_1.png

Denoising auto encoder

Denoising auto encoder[2] is almost same as auto encoder except that it is given noised input data. As a noise distribution, gauss noise or salt and pepper noise are generally used.

In this case, we add salt and pepper noise to input data. Specifically, we randomly set the pixel value to 0 with a probability of 50%.

In [5]:
test_img = x_test[0].reshape(28, 28)
sp_noise = np.array(np.random.rand(*test_img.shape) > 0.5, dtype=np.bool)
fig, ax = plt.subplots(ncols=2, nrows=1, figsize=(8, 6))
ax[0].set_title("Raw image")
ax[0].imshow(test_img, cmap="gray")
ax[1].set_title("Added salt and pepper noise")
ax[1].imshow(test_img*sp_noise, cmap="gray")
plt.show()
../../../_images/notebooks_basic_autoencoder_notebook_11_0.png

Model definition

Denoising auto encoder model is same as vanilla auto encoder.

In [ ]:
model_denoise_ae = Sequential([
        Dense(100),
        Relu(),
        Dense(784),
    ])
optimizer = Adam()

Train Data

Train loop is same as vanilla auto encoder except adding noise to input data.

In [ ]:
batch = 64
epoch = 10

for i in range(epoch):
    for j in range(N//batch):
        train_batch=x_train[j*batch:(j+1)*batch]
        with model_denoise_ae.train():
            sp_noise = np.array(np.random.rand(*train_batch.shape) > 0.5, dtype=np.bool)
            z = model_denoise_ae(train_batch*sp_noise)
            loss = sigmoid_cross_entropy(z, train_batch)
        loss.grad().update(optimizer)
    if i%2 == 0:print("epoch %02d train_loss:%f"%(i, loss))

# Show raw img and reconstructed img.
test_img = x_test[0]
fig, ax = plt.subplots(ncols=2, nrows=1, figsize=(8, 6))
ax[0].set_title("Raw image")
ax[0].imshow(test_img.reshape(28, 28), cmap="gray")
ax[1].set_title("Reconstructed image")
ax[1].imshow(sigmoid(model_denoise_ae(test_img)).reshape(28, 28), cmap="gray")
plt.show()
epoch 00 train_loss:97.368622
epoch 02 train_loss:78.387123

Comparison of 2 pretrained model

In this section, we compare 2 pretrained model. One model is pretrained by vanilla auto encoder. Another one is trained by denoising auto encoder.

We use noise data for input data same as above section. We can confirm that pretrained weight can complements salt and pepper noise and it leads to higher generalization performance.

We build a classification model like the following code. These two models are same hyper parameters, but they are set pretrained weight parameters.

In [ ]:
pretrained_ae = Sequential([
        Dense(100),
        Relu(),
        Dense(10),
    ])

pretrained_dae = Sequential([
        Dense(100),
        Relu(),
        Dense(10),
    ])

# Copy first weight parameters of first layer.
pretrained_ae[0].params = model_ae[0].params
pretrained_dae[0].params = model_denoise_ae[0].params

opt1 = Adam()
opt2 = Adam()

Train loop

Then we train classification models. Following code is almost same as tutorial 1.

In [ ]:
batch = 64
epoch = 40

train_loss1 = []
train_loss2 = []
validation_loss1 = []
validation_loss2 = []

for i in range(epoch):
    for j in range(N//batch):
        train_batch = x_train[j*batch:(j+1)*batch]
        responce_batch = label_train[j*batch:(j+1)*batch].astype(np.float32)
        sp_noise = np.array(np.random.rand(*train_batch.shape) > 0.5)
        train_batch = train_batch*sp_noise

        with pretrained_ae.train():
            z = pretrained_ae(train_batch)
            loss1 = softmax_cross_entropy(z, responce_batch)

        with pretrained_dae.train():
            z = pretrained_dae(train_batch)
            loss2 = softmax_cross_entropy(z, responce_batch)

        loss1.grad().update(opt1)
        loss2.grad().update(opt2)

    validation1 = softmax_cross_entropy(pretrained_ae(x_test), label_test)
    validation2 = softmax_cross_entropy(pretrained_dae(x_test), label_test)

    train_loss1.append(loss1)
    train_loss2.append(loss2)
    validation_loss1.append(validation1)
    validation_loss2.append(validation2)

    strs = "epoch:%02d AE_loss:%f AE_validation:%f DAE_loss:%f DAE_validation:%f"
    if i%2 == 0:print(strs%(i, loss1, validation1, loss2, validation2))

Learning Curve

Comparing 2 models using their learning curve, the model pretrained using denoising auto encoder yields low validation error.

In [ ]:
plt.figure(figsize=(8, 5))
plt.grid()
plt.plot(train_loss1, label="AE_train_loss", linestyle="--", linewidth=3)
plt.plot(validation_loss1, label="AE_validation_loss", linewidth=3)
plt.plot(train_loss2, label="DAE_train_loss", linestyle="--", linewidth=3)
plt.plot(validation_loss2, label="DAE_validation_loss", linewidth=3)
plt.ylabel("loss")
plt.xlabel("epoch")
plt.legend()
plt.show()
In [ ]:
prediction1 = np.argmax(pretrained_ae(x_test).as_ndarray(), axis = 1)
prediction2 = np.argmax(pretrained_dae(x_test).as_ndarray(), axis = 1)

print("///////////// AE pretrained model //////////////")
print(classification_report(np.argmax(label_test, axis = 1), prediction1))

print("///////////// DAE pretrained model //////////////")
print(classification_report(np.argmax(label_test, axis = 1), prediction2))

References

[1] Diederik P. Kingma, Jimmy Ba. Adam: A Method for Stochastic Optimization. 3rd International Conference for Learning Representations, San Diego, 2015.
[2] Pascal Vincent, Hugo Larochelle, Yoshua Bengio and Pierre-Antoine Manzagol. Extracting and Composing Robust Features with Denoising Autoencoders. Proc. of ICML, 2008.