Concatenate Layer Output

Concatenate layer output using fully connected neural network model to mnist.

In this tutorial, we build a fully connected neural network model for clustering digit images. You can learn following points.

  • How to concatenate the output of layers

Required libraries

In [1]:
from __future__ import division, print_function
import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import confusion_matrix, classification_report

import renom as rm
from renom.optimizer import Sgd

Load data

Next, we have to load-in the raw, binary MNIST data and shape into training-ready objects. To accomplish this, we’ll use the fetch_mldata module included in the scikit-learn package.

The MNIST dataset consists of 70000 digit images. Before we do anything else, we have to split the data into a training set and a test set. We’ll then do two important pre-processing steps that make for a smoother training process: 1) Re-scale the image data (originaly integer values 0-255) to have a range from 0 to 1. 2) ‘’Binarize’’ the labels- map each digit (0-9) to a vector of 0s and 1s.

In [2]:
# Datapath must point to the directory containing the mldata folder.
data_path = "../dataset"
mnist = fetch_mldata('MNIST original', data_home=data_path)

X = mnist.data
y = mnist.target

# Rescale the image data to 0 ~ 1.
X = X.astype(np.float32)
X /= X.max()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)
labels_train = LabelBinarizer().fit_transform(y_train).astype(np.float32)
labels_test = LabelBinarizer().fit_transform(y_test).astype(np.float32)

# Training data size.
N = len(X_train)

Define the neural network and concatenate the model output

Recently, many neural network architecture are complicated and have the split the input data or concatenate the output data sometimes.
You can use the concatenate the model output as bellow.
In [3]:
class Mnist(rm.Model):

    def __init__(self):
        super(Mnist, self).__init__()
        self._layer1 = rm.Dense(100)
        self._layer2 = rm.Dense(20)
        self._layer3 = rm.Dense(100)
        self._layer4 = rm.Dense(20)
        self._layer5 = rm.Dense(10)

    def forward(self, x):
        t1 = self._layer2(rm.relu(self._layer1(x)))
        t2 = self._layer4(rm.relu(self._layer3(x)))
        concatenated = rm.concat(t1,t2)
        out = self._layer5(concatenated)
        return out

Instantiation

In [4]:
network = Mnist()

Training loop

Now that the network is built, we can start to do the actual training. Rather than using vanilla “batch” gradient descent, which is computationally expensive, we’ll use mini-batch stochastic gradient descent (SGD). This method trains on a handful of examples per iteration (the “batch-size”), allowing us to make “stochastic” updates to the weights in a short time. The learning curve will appear noisier, but this method tends to converge much faster.

In [5]:
# Hyper parameters
batch = 64
epoch = 10

optimizer = Sgd(lr = 0.1)

learning_curve = []
test_learning_curve = []

for i in range(epoch):
    perm = np.random.permutation(N)
    loss = 0
    for j in range(0, N // batch):
        train_batch = X_train[perm[j * batch:(j + 1) * batch]]
        responce_batch = labels_train[perm[j * batch:(j + 1) * batch]]

        # The computational graph is only generated for this block:
        with network.train():
            l = rm.softmax_cross_entropy(network(train_batch), responce_batch)
            if hasattr(network, "weight_decay"):
                l += 0.0001 * network.weight_decay()

        # Back propagation
        grad = l.grad()

        # Update
        grad.update(optimizer)

        # Changing type to ndarray is recommended.
        loss += l.as_ndarray()

    train_loss = loss / (N // batch)

    # Validation
    test_loss = rm.softmax_cross_entropy(network(X_test), labels_test).as_ndarray()
    test_learning_curve.append(test_loss)
    learning_curve.append(train_loss)
    print("epoch %03d train_loss:%f test_loss:%f"%(i, train_loss, test_loss))
epoch 000 train_loss:0.267641 test_loss:0.141645
epoch 001 train_loss:0.113229 test_loss:0.103424
epoch 002 train_loss:0.076932 test_loss:0.086349
epoch 003 train_loss:0.057434 test_loss:0.082932
epoch 004 train_loss:0.044242 test_loss:0.090178
epoch 005 train_loss:0.034046 test_loss:0.084983
epoch 006 train_loss:0.026465 test_loss:0.081169
epoch 007 train_loss:0.021621 test_loss:0.073292
epoch 008 train_loss:0.016413 test_loss:0.073131
epoch 009 train_loss:0.012462 test_loss:0.079027

Model evaluation

After training our model, we have to evaluate it. For each class (digit), we’ll use several scoring metrics: precision, recall, F1 score, and support, to get a full sense of how the model performs on our test data.

In [6]:
predictions = np.argmax(network(X_test).as_ndarray(), axis=1)

# Confusion matrix and classification report.
print(confusion_matrix(y_test, predictions))
print(classification_report(y_test, predictions))

# Learning curve.
plt.plot(learning_curve, linewidth=3, label="train")
plt.plot(test_learning_curve, linewidth=3, label="test")
plt.title("Learning curve")
plt.ylabel("error")
plt.xlabel("epoch")
plt.legend()
plt.grid()
plt.show()
[[676   0   0   0   0   1   4   0   1   0]
 [  0 832   1   0   0   0   1   7   1   0]
 [  3   2 706   3   2   0   0   4   5   0]
 [  1   2   1 695   0   5   0   0   2   1]
 [  1   0   0   1 708   1   7   0   1   3]
 [  0   0   1   3   0 589   3   1   2   1]
 [  2   0   1   0   1   2 683   0   1   0]
 [  0   0   2   1   2   0   1 683   0   4]
 [  1   3   2   5   3   4   1   1 652   1]
 [  2   1   0   7  14   4   0   5   1 632]]
             precision    recall  f1-score   support

        0.0       0.99      0.99      0.99       682
        1.0       0.99      0.99      0.99       842
        2.0       0.99      0.97      0.98       725
        3.0       0.97      0.98      0.98       707
        4.0       0.97      0.98      0.98       722
        5.0       0.97      0.98      0.98       600
        6.0       0.98      0.99      0.98       690
        7.0       0.97      0.99      0.98       693
        8.0       0.98      0.97      0.97       673
        9.0       0.98      0.95      0.97       666

avg / total       0.98      0.98      0.98      7000

../../../_images/notebooks_basic_concatenate_notebook_12_1.png