Dropout

Dropout using fully connected neural network model to mnist.

Dropout is very useful to avoid the overfitting problem.
In this tutorial, we build a fully connected neural network model for clustering digit images. You can learn following points.
  • How to use dropout

Required libraries

In [1]:
from __future__ import division, print_function
import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import confusion_matrix, classification_report

import renom as rm
from renom.optimizer import Sgd

Load data

Next, we have to load-in the raw, binary MNIST data and shape into training-ready objects. To accomplish this, we’ll use the fetch_mldata module included in the scikit-learn package.

The MNIST dataset consists of 70000 digit images. Before we do anything else, we have to split the data into a training set and a test set. We’ll then do two important pre-processing steps that make for a smoother training process: 1) Re-scale the image data (originaly integer values 0-255) to have a range from 0 to 1. 2) ‘’Binarize’’ the labels- map each digit (0-9) to a vector of 0s and 1s.

In [2]:
# Datapath must point to the directory containing the mldata folder.
data_path = "../dataset"
mnist = fetch_mldata('MNIST original', data_home=data_path)

X = mnist.data
y = mnist.target

# Rescale the image data to 0 ~ 1.
X = X.astype(np.float32)
X /= X.max()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
labels_train = LabelBinarizer().fit_transform(y_train).astype(np.float32)
labels_test = LabelBinarizer().fit_transform(y_test).astype(np.float32)

# Training data size.
N = len(X_train)

Define the neural network and dropout

Dropout is useful to avoid the overfitting problem.
Dropout ratio is how ratio is dropped from each layer.
We can get the effect like the ensemble method because the unit which dropped from each layer different each time.
So, we recommend you use dropout in the case that the data is easy to occur the overfitting.
In [3]:
class Mnist(rm.Model):

    def __init__(self):
        super(Mnist, self).__init__()
        self._layer1 = rm.Dense(100)
        self._layer2 = rm.Dense(10)
        self._dropout1 = rm.Dropout(dropout_ratio=0.5)

    def forward(self, x):
        t1 = self._dropout1(self._layer1(x))
        out = self._layer2(t1)
        return out

Instantiation

In [4]:
network = Mnist()

Training loop

Now that the network is built, we can start to do the actual training. Rather than using vanilla “batch” gradient descent, which is computationally expensive, we’ll use mini-batch stochastic gradient descent (SGD). This method trains on a handful of examples per iteration (the “batch-size”), allowing us to make “stochastic” updates to the weights in a short time. The learning curve will appear noisier, but this method tends to converge much faster.

In [5]:
# Hyper parameters
batch = 64
epoch = 10

optimizer = Sgd(lr = 0.1)

learning_curve = []
test_learning_curve = []

for i in range(epoch):
    perm = np.random.permutation(N)
    loss = 0
    for j in range(0, N // batch):
        train_batch = X_train[perm[j * batch:(j + 1) * batch]]
        responce_batch = labels_train[perm[j * batch:(j + 1) * batch]]

        # The computational graph is only generated for this block:
        with network.train():
            l = rm.softmax_cross_entropy(network(train_batch), responce_batch)
            if hasattr(network, "weight_decay"):
                l += 0.0001 * network.weight_decay()

        # Back propagation
        grad = l.grad()

        # Update
        grad.update(optimizer)

        # Changing type to ndarray is recommended.
        loss += l.as_ndarray()

    train_loss = loss / (N // batch)

    # Validation
    test_loss = rm.softmax_cross_entropy(network(X_test), labels_test).as_ndarray()
    test_learning_curve.append(test_loss)
    learning_curve.append(train_loss)
    print("epoch %03d train_loss:%f test_loss:%f"%(i, train_loss, test_loss))
epoch 000 train_loss:0.457225 test_loss:0.412944
epoch 001 train_loss:0.374044 test_loss:0.377608
epoch 002 train_loss:0.356384 test_loss:0.364094
epoch 003 train_loss:0.350757 test_loss:0.368991
epoch 004 train_loss:0.342420 test_loss:0.367725
epoch 005 train_loss:0.337317 test_loss:0.377109
epoch 006 train_loss:0.335588 test_loss:0.361930
epoch 007 train_loss:0.332697 test_loss:0.351402
epoch 008 train_loss:0.329190 test_loss:0.356135
epoch 009 train_loss:0.329743 test_loss:0.367464

Model evaluation

After training our model, we have to evaluate it. For each class (digit), we’ll use several scoring metrics: precision, recall, F1 score, and support, to get a full sense of how the model performs on our test data.

In [6]:
predictions = np.argmax(network(X_test).as_ndarray(), axis=1)

# Confusion matrix and classification report.
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

# Learning curve.
plt.plot(learning_curve, linewidth=3, label="train")
plt.plot(test_learning_curve, linewidth=3, label="test")
plt.title("Learning curve")
plt.ylabel("error")
plt.xlabel("epoch")
plt.legend()
plt.grid()
plt.show()
[[675   0   4   2   2  13   3   2   3   2]
 [  0 774   6   2   1   5   3   3  14   0]
 [  6   3 649  14  18   4   6  11  18   4]
 [  4   5  22 591   2  29   1   7  16  10]
 [  3   3   2   0 633   0   7   6   3  26]
 [ 10   5  10  10  13 517   9   4  41   7]
 [  8   4   6   1  18  14 621   1   7   1]
 [  1   1   8   4   5   0   0 670   1  24]
 [  5  18  16  13   8   9   3   1 603  15]
 [  3   5   0   7  23   5   0  34   6 588]]
             precision    recall  f1-score   support

        0.0       0.94      0.96      0.95       706
        1.0       0.95      0.96      0.95       808
        2.0       0.90      0.89      0.89       733
        3.0       0.92      0.86      0.89       687
        4.0       0.88      0.93      0.90       683
        5.0       0.87      0.83      0.85       626
        6.0       0.95      0.91      0.93       681
        7.0       0.91      0.94      0.92       714
        8.0       0.85      0.87      0.86       691
        9.0       0.87      0.88      0.87       671

avg / total       0.90      0.90      0.90      7000

../../../_images/notebooks_basic_dropout_notebook_12_1.png