Anomaly Detection Using Autoencoder

Examine the performance of autoencder for anomaly detection using MNIST

Anomaly detection is an important problem in many areas. It is applied to intrusion detection, fraud detection, medical diagnosis, industrial damage detection, text data and so on[1].
Anomaly detection tequniques are classified into three modes, which are supervised, semi-supervised and unsupervised.
Supervised mode requires a dataset which has labelled instances for normal as well as anomaly class.
In many cases, There are two issues in the supervised anomaly detection.
First, It is difficult to collect anomaly instances. Second, Obtaining acculately labelled instances is challenging.
Becase of such reasons when it is difficult to do supervised anomaly detection, it may be able to use autoencoder .
Unlike supervised method, autoencoder only requires an unlabelled dataset which contains little or no anomaly instances. Obtaining such data is easy in real-world applications. In this tutorial, detect anomaly with MNIST using autoencoder. Use one of digits and Fashion-MNIST as anomaly data, use MINIST without anomaly digits as training data. To show that autoencoder can detect anomaly if there aren't anomaly data and labels, do training and threashold definition without anomaly data and labels.

Required Libaries

  • matplotlib 2.0.2
  • numpy 1.13.1
  • scikit-learn 0.18.2
  • python-mnist 0.5
In [1]:
from __future__ import division, print_function
import datetime
import numpy as np

import renom as rm
from renom.optimizer import Adam

from sklearn.metrics import confusion_matrix, classification_report, roc_curve, accuracy_score
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.datasets import fetch_mldata

import matplotlib.pyplot as plt

from mnist import MNIST

from renom.cuda import set_cuda_active

# If you would like to use GPU, set True, otherwise you should be set to False.
set_cuda_active(False)

Load Data

Load MNIST and Fashion MNIST and split those into training set and test set.
The MNIST dataset consists of 70000 digit images.
The Fashion MNIST consists of 70000 fashion images.
Download MNIST from http://yann.lecun.com/exdb/mnist/ and put it into directory 'MNIST'.
Download Fashion-MNIST from https://github.com/zalandoresearch/fashion-mnist and put it into directory 'Fashion_MNIST'.
Please make sure the name of the dataset is as follows:
  • t10k-images-idx3-ubyte
  • t10k-labels-idx1-ubyte
  • train-images-idx3-ubyte
  • train-labels-idx1-ubyte

If not so, rename dataset.

In [2]:
mnist = MNIST('./MNIST')
x_train, y_train = mnist.load_training()
x_test, y_test = mnist.load_testing()

fashion = MNIST('./Fashion_MNIST')
x_fashion, y_fashion = fashion.load_training()

Data Preprocessing

In this part, preprocess dataset.
Firstly, transform dataset into np.ndarray.
Normalize dataset between 0 and 1 and flatten the 28x28 images into vectors of size 784.
Next, preprocess test dataset.
Collect 100 samples from every class in the test data and sort in order of (0~9, fashion).
In this tutorial, regard '5' as anomaly digit. Also treat fashion as anomaly data and label it as '10'.
Finally, erase '5' from MNIST training dataset.
In [3]:
x_train=np.asarray(x_train)
y_train=np.asarray(y_train)
x_test=np.asarray(x_test)
y_test=np.asarray(y_test)
# Rescale the image data to 0 ~ 1.
x_train = x_train.astype(np.float32) / 255.0
x_test = x_test.astype(np.float32) / 255.0
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

x_fashion=np.asarray(x_fashion)
y_fashion=np.asarray(y_fashion)
x_fashion = x_fashion.astype(np.float32) / 255.0
x_fashion = x_fashion.reshape((len(x_fashion), np.prod(x_train.shape[1:])))

set_num = 100
x_sorted = x_test[y_test == 0]
x_sorted = x_sorted[:set_num]
for i in range(1,10):
    x = x_test[y_test == i]
    x_sorted = np.concatenate([x_sorted, x[:set_num]], axis=0)
x_sorted = np.concatenate([x_sorted, x_fashion[:][:set_num]], axis=0)

anomaly_digit = 5
fashion_label = 10
x_train = x_train[y_train != anomaly_digit]
x_test = x_test[y_test != anomaly_digit]

Model Definition

In [4]:
autoencoder = rm.Sequential([
    rm.Dense(32),
    rm.Relu(),
    rm.Dense(784),
    rm.Sigmoid()
])

Define optimizer

Choose adam as an optimization algorithm, set learning rate to 0.001.

In [5]:
optimizer = Adam(lr=0.001)

Training loop

Make random index using the function numpy.random.permutation, and construct batch data.
In this tutorial, set the batch size to 128, epoch to 50. Use mean squared error as error function.
Note that it doesn't use '5' and 'fashion' as training data.
In [6]:
# parameters
EPOCH = 50 # Number of epochs
BATCH = 128 # Mini-batch size

# Learning curves
learning_curve = []
test_curve = []

# Training loop
for i in range(1, 1+EPOCH):

    N = x_train.shape[0] # Number of records in training data
    perm = np.random.permutation(N)
    train_loss = 0

    for j in range(N//BATCH):
        # Make mini-batch
        index = perm[j*BATCH:(j+1)*BATCH]
        train_batch_x = x_train[index]

        # Forward propagation
        with autoencoder.train():
            x_decoded = autoencoder(train_batch_x)
            loss = rm.mean_squared_error(x_decoded, train_batch_x)

        # Backpropagation
        grad = loss.grad()

        # Update
        grad.update(optimizer)

        train_loss += loss.as_ndarray()

    # calculate mean squared error for training data
    train_loss = train_loss / (N // BATCH)
    learning_curve.append(train_loss)

    # calculate mean squared error for testidation data
    x_decoded = autoencoder(x_test)
    test_loss = rm.mean_squared_error(x_decoded, x_test).as_ndarray()
    test_curve.append(test_loss)

    # print training progress
    if i % 10 == 0:
        print("Epoch %d - loss: %f - test_loss: %f" % (i, train_loss, test_loss))

print('Finished!')
Epoch 10 - loss: 4.024256 - test_loss: 3.856400
Epoch 20 - loss: 3.846630 - test_loss: 3.720759
Epoch 30 - loss: 3.791803 - test_loss: 3.674791
Epoch 40 - loss: 3.756036 - test_loss: 3.649103
Epoch 50 - loss: 3.732420 - test_loss: 3.633991
Finished!

Reconstructred images

Display original images and reconstructed images.
Note that '5' is almost reconstructed though training data doesn't contain '5'.
When comparing 'fashion', we can find that autoencoder can't reconstruct it.
From this result, we can predict that the model detect 'fashion' but cannot detect '5'.
In [11]:
n = 11  # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
    # display original
    ax = plt.subplot(2, n, i + 1)
    image = x_sorted[i * set_num].reshape(28,28)
    plt.imshow(image)
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    decoded_image = autoencoder(x_sorted[i * set_num]).as_ndarray()
    decoded_image = decoded_image.reshape(28,28)
    plt.imshow(decoded_image)
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()
../../../_images/notebooks_clustering_anomaly-detection-using-autoencoder_notebook_14_0.png

Threashold Definition

From loss value, define threashold to calssify normal or not.
In this tutorial, define threashold using only normal data because we assume that there aren't anomaly data.
Simply determine threashold using max loss value in the normal data.
If the loss of input data is less than or equal to threashold, classify it as normal.
In [12]:
normal_max = 0
for x in x_train:
    normal_loss = autoencoder(x).as_ndarray()
    loss = np.mean((normal_loss-x)**2)
    normal_max = max(normal_max, loss)

margin = 0.0
threashold = normal_max + margin
print("threashold:%f" % threashold)
threashold:0.057508

Visualization

Visualize losses with respect to test data.
Background color of '5' is red, background color of 'fashion' is green.
This graph show that trained model can't detect '5' totally but can detect 'fashion' almost correctly.
By adjusting threshold, will be able to improve the ability to detect 'fashion' but won't be able to detect '5'.
The reason is that there is no differece in loss between '5' and other normal digit.
In [13]:
losses = []
for x in x_sorted:
    x_decoded = autoencoder(x).as_ndarray()
    loss = np.mean((x_decoded-x)**2)
    losses.append(loss)

# plot
plt.figure(figsize=(20, 8))
for s in range(11):
    if s == anomaly_digit:
        plt.axvspan(s * set_num, s * set_num + set_num, facecolor='red', alpha=0.5)
    elif s == fashion_label:
        plt.axvspan(s * set_num, s * set_num + set_num, facecolor='green', alpha=0.5)
plt.xlim(0, set_num*11)
plt.axhline(threashold, color='black', linewidth=1, linestyle='dashed')
plt.text(0, threashold, 'threashold = %f' % threashold, fontsize=16)
plt.plot(range(len(losses)), losses, '.', ms=4)
plt.legend(loc='best')
plt.grid()
plt.xlabel('sample index')
plt.ylabel('loss')
plt.show()
../../../_images/notebooks_clustering_anomaly-detection-using-autoencoder_notebook_18_0.png

Accuracy

Calculate the accuracy of 'normal digit', '5', 'fashon' respectively.
There is no false positive but don't detect '5' totally.
This result show that the model used this time can't detect anomaly data similar to normal data.
In [14]:
true_normal = 0
false_normal = 0
true_anomaly_digit = 0
false_anomaly_digit = 0
true_fashion = 0
false_fashion = 0
for i, loss in enumerate(losses):
    cls = i // set_num
    if cls == anomaly_digit:
        if loss <= threashold:
            false_anomaly_digit += 1
        else:
            true_anomaly_digit += 1
    elif cls == fashion_label:
        if loss <= threashold:
            false_fashion += 1
        else:
            true_fashion += 1
    else:
        if loss <= threashold:
            true_normal += 1
        else:
            false_normal += 1

accuracy_normal = true_normal / (false_normal + true_normal)
accuracy_digit = true_anomaly_digit / (false_anomaly_digit + true_anomaly_digit)
accuracy_fashion = true_fashion / (false_fashion + true_fashion)

print('accuracy(normal):%.2f' % accuracy_normal)
print('accuracy(anomaly digit):%.2f' % accuracy_digit)
print('accuracy(fashion):%.2f' % accuracy_fashion)
accuracy(normal):1.00
accuracy(anomaly digit):0.00
accuracy(fashion):0.80

References

[1] VARUN CHANDOLA, ARINDAM BANERJEE, VIPIN KUMAR, Anomaly Detection : A Survey, ACM Computing Surveys, 2009