Tutorial 1 Mnist classifier

Digit image classification problem using fully connected neural network model.

In this tutorial, we build a fully connected neural network model for clustering digit images. You can learn following points.

  • How to build Sequential or Fuctional model
  • How to train models

Required libraries

In [1]:
from __future__ import division, print_function
import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import confusion_matrix, classification_report

import renom as rm
from renom.optimizer import Sgd

Load data

Next, we have to load-in the raw, binary MNIST data and shape into training-ready objects. To accomplish this, we’ll use the fetch_mldata module included in the scikit-learn package.

The MNIST dataset consists of 70000 digit images. Before we do anything else, we have to split the data into a training set and a test set. We’ll then do two important pre-processing steps that make for a smoother training process: 1) Re-scale the image data (originaly integer values 0-255) to have a range from 0 to 1. 2) ‘’Binarize’’ the labels- map each digit (0-9) to a vector of 0s and 1s.

In [2]:
# Datapath must point to the directory containing the mldata folder.
data_path = "../dataset"
mnist = fetch_mldata('MNIST original', data_home=data_path)

X = mnist.data
y = mnist.target

# Rescale the image data to 0 ~ 1.
X = X.astype(np.float32)
X /= X.max()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
labels_train = LabelBinarizer().fit_transform(y_train).astype(np.float32)
labels_test = LabelBinarizer().fit_transform(y_test).astype(np.float32)

# Training data size.
N = len(X_train)

Define the neural network

Next, we setup the neural network itself. In this tutorial, we’ll construct a 2-layer network. In other words, it will consist of an input layer in additional to 2 parametrized layers. We then have to choose an activation function for the neurons. Although the ‘’sigmoid’’ function is a popular starting example in ML courses, the ‘’rectified linear function’’ (ReLu) is becoming a popular default when training neural networks. We’ll use the ReLU function here.

For validation and prediction purposes, it’s not necessary build a computational graph. During these two processes, we’ll disable the graph in order to conserve memory usage,. This is done by calling the method detach_graph()

In [3]:
class Mnist1(rm.Model):

    def __init__(self):
        super(Mnist1, self).__init__()
        self._layer1 = rm.Dense(100)
        self._layer2 = rm.Dense(10)

    def forward(self, x):
        out = self._layer2(rm.relu(self._layer1(x)))
        return out

Neural network with a L2 normalization term

In [4]:
class Mnist2(rm.Model):

    def __init__(self):
        super(Mnist2, self).__init__()
        self._layer1 = rm.Dense(100)
        self._layer2 = rm.Dense(10)

    def forward(self, x):
        out = self._layer2(rm.relu(self._layer1(x)))
        return out

    def weight_decay(self):
        weight_decay = rm.sum(self._layer1.params.w**2) + rm.sum(self._layer2.params.w**2)
        return weight_decay

Definition of a nural network with sequential model

In [5]:
sequential = rm.Sequential([
        rm.Dense(100),
        rm.Relu(),
        rm.Dense(10),
    ])

Instantiation

In [6]:
# Choose neural network.
# network = Mnist1()
# network = Mnist2()
network = sequential

Training loop

Now that the network is built, we can start to do the actual training. Rather than using vanilla “batch” gradient descent, which is computationally expensive, we’ll use mini-batch stochastic gradient descent (SGD). This method trains on a handful of examples per iteration (the “batch-size”), allowing us to make “stochastic” updates to the weights in a short time. The learning curve will appear noisier, but this method tends to converge much faster.

In [7]:
# Hyper parameters
batch = 64
epoch = 10

optimizer = Sgd(lr = 0.1)

learning_curve = []
test_learning_curve = []

for i in range(epoch):
    perm = np.random.permutation(N)
    loss = 0
    for j in range(0, N // batch):
        train_batch = X_train[perm[j * batch:(j + 1) * batch]]
        responce_batch = labels_train[perm[j * batch:(j + 1) * batch]]

        # The computational graph is only generated for this block:
        with network.train():
            l = rm.softmax_cross_entropy(network(train_batch), responce_batch)
            if hasattr(network, "weight_decay"):
                l += 0.0001 * network.weight_decay()

        # Back propagation
        grad = l.grad()

        # Update
        grad.update(optimizer)

        # Changing type to ndarray is recommended.
        loss += l.as_ndarray()

    train_loss = loss / (N // batch)

    # Validation
    test_loss = rm.softmax_cross_entropy(network(X_test), labels_test).as_ndarray()
    test_learning_curve.append(test_loss)
    learning_curve.append(train_loss)
    print("epoch %03d train_loss:%f test_loss:%f"%(i, train_loss, test_loss))
epoch 000 train_loss:0.333865 test_loss:0.206294
epoch 001 train_loss:0.169508 test_loss:0.157213
epoch 002 train_loss:0.125200 test_loss:0.123576
epoch 003 train_loss:0.100399 test_loss:0.104024
epoch 004 train_loss:0.083224 test_loss:0.098977
epoch 005 train_loss:0.072607 test_loss:0.092286
epoch 006 train_loss:0.062846 test_loss:0.083760
epoch 007 train_loss:0.055399 test_loss:0.080172
epoch 008 train_loss:0.049468 test_loss:0.089180
epoch 009 train_loss:0.044689 test_loss:0.074930

Model evaluation

After training our model, we have to evaluate it. For each class (digit), we’ll use several scoring metrics: precision, recall, F1 score, and support, to get a full sense of how the model performs on our test data.

In [8]:
predictions = np.argmax(network(X_test).as_ndarray(), axis=1)

# Confusion matrix and classification report.
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

# Learning curve.
plt.plot(learning_curve, linewidth=3, label="train")
plt.plot(test_learning_curve, linewidth=3, label="test")
plt.title("Learning curve")
plt.ylabel("error")
plt.xlabel("epoch")
plt.legend()
plt.grid()
plt.show()
[[678   0   1   1   1   2   2   2   4   1]
 [  0 749   2   1   3   0   0   1   2   0]
 [  1   5 686   3   2   0   0   3   2   1]
 [  0   0   2 719   0  10   0   3   2   3]
 [  0   1   0   0 639   0   3   2   0  13]
 [  1   1   0   5   0 657   1   0   1   1]
 [  6   0   1   0   3   3 664   0   2   0]
 [  2   1   1   1   7   1   0 681   0   3]
 [  2   5   1   4   2   4   1   0 683   3]
 [  0   1   0   2   4   3   1   6   1 684]]
             precision    recall  f1-score   support

        0.0       0.98      0.98      0.98       692
        1.0       0.98      0.99      0.98       758
        2.0       0.99      0.98      0.98       703
        3.0       0.98      0.97      0.97       739
        4.0       0.97      0.97      0.97       658
        5.0       0.97      0.99      0.98       667
        6.0       0.99      0.98      0.98       679
        7.0       0.98      0.98      0.98       697
        8.0       0.98      0.97      0.97       705
        9.0       0.96      0.97      0.97       702

avg / total       0.98      0.98      0.98      7000

../../_images/notebooks_tutorial1-mnist_notebook_16_1.png