Batch Normalization

How to use batch normalization with ReNom

Batch Normalization is one of the popular techniques and often used for image processing.
As it has been said, the distribution of the training data is different from the distribution of the prediction in many cases.
And learning unstability of hidden layer of deep neural network.
Batch Normalization constitutes from standardization and linear transformation, this means input distribution of layer transforms to the gaussian distribution, 0 mean and 1 variance and adjust the mean and variance.

In the without batch normalization case, when the parameter is updated, the weight is chaged depends on input. But input is changed depends on the predecessor, so the inputs of layer are different every time. This is one of the learning difficulties of hidden layers. Batch Normalization is useful technique for learning hidden layer’s parameter by adjusting inputs distribution to gaussian distribution.

Required libraries

  • matplotlib 2.0.2
  • numpy 1.12.1
  • scikit-learn 0.18.2
  • pillow 4.2.1
In [1]:
from __future__ import division, print_function
import os
import sys
import pickle

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import confusion_matrix, classification_report

import renom as rm
from renom.optimizer import Sgd, Adam
from renom.cuda.cuda import set_cuda_active

GPU-enabled Computing

If you wish to use a GPU, you’ll need to call the set_cuda_active() with the single argument True . This will generally allow training to run much faster than relying on the CPU. You’ll need an NVIDIA GPU installed on your machine.

In [2]:
set_cuda_active(True)

Load data

Here we just unpickle the CIFAR image data, collected from the CIFAR website ( https://www.cs.toronto.edu/~kriz/cifar.html ). As in Tutorial 1, we scale the data to a range of 0 to 1 and binarize the labels.

In [3]:
dir = "./cifar-10-batches-py/"
paths = ["data_batch_1", "data_batch_2", "data_batch_3",
         "data_batch_4", "data_batch_5"]

def unpickle(f):
    fo = open(f, 'rb')
    if sys.version_info.major == 2:
        # Python 2.7
        d = pickle.load(fo)
    elif sys.version_info.major == 3:
        # Python 3.4
        d = pickle.load(fo, encoding="latin-1")
    fo.close()
    return d

# Load train data.
data = list(map(unpickle, [os.path.join(dir, p) for p in paths]))
train_x = np.vstack([d["data"] for d in data])
train_y = np.vstack([d["labels"] for d in data])

# Load test data.
data = unpickle(os.path.join(dir, "test_batch"))
test_x = np.array(data["data"])
test_y = np.array(data["labels"])

# Rehsape and rescale image.
train_x = train_x.reshape(-1, 3, 32, 32)
train_y = train_y.reshape(-1, 1)
test_x = test_x.reshape(-1, 3, 32, 32)
test_y = test_y.reshape(-1, 1)

train_x = train_x / 255.
test_x = test_x / 255.

# Binalize
labels_train = LabelBinarizer().fit_transform(train_y)
labels_test = LabelBinarizer().fit_transform(test_y)

# Change types.
train_x = train_x.astype(np.float32)
test_x = test_x.astype(np.float32)
labels_train = labels_train.astype(np.float32)
labels_test = labels_test.astype(np.float32)

N = len(train_x)

Neural network definition

Setup the CNN- essentially similar to Tutorial 1, except that here we are using several hidden layers. Also, we try to avoid over-fitting, by using the’‘dropout’’ technique.

In [4]:
class Cifar10(rm.Model):

    def __init__(self):
        super(Cifar10, self).__init__()
        self._l1 = rm.Conv2d(channel=32)
        self._l2 = rm.Conv2d(channel=32)
        self._l3 = rm.Conv2d(channel=64)
        self._l4 = rm.Conv2d(channel=64)
        self._l5 = rm.Dense(512)
        self._l6 = rm.Dense(10)
        self._sd = rm.SpatialDropout(dropout_ratio=0.25)
        self._pool = rm.MaxPool2d(filter=2, stride=2)

    def forward(self, x):
        t1 = rm.relu(self._l1(x))
        t2 = self._sd(self._pool(rm.relu(self._l2(t1))))
        t3 = rm.relu(self._l3(t2))
        t4 = self._sd(self._pool(rm.relu(self._l4(t3))))
        t5 = rm.flatten(t4)
        t6 = rm.dropout(rm.relu(self._l5(t5)))
        t7 = self._l6(t5)
        return t7

Definition of a nural network with sequential model

In [5]:
sequential = rm.Sequential([
        rm.Conv2d(channel=32),
        rm.BatchNormalize(),
        rm.Relu(),
        rm.Conv2d(channel=32),
        rm.BatchNormalize(),
        rm.Relu(),
        rm.MaxPool2d(filter=2, stride=2),
        rm.Dropout(dropout_ratio=0.25),
        rm.Conv2d(channel=64),
        rm.BatchNormalize(),
        rm.Relu(),
        rm.Conv2d(channel=64),
        rm.BatchNormalize(),
        rm.Relu(),
        rm.MaxPool2d(filter=2, stride=2),
        rm.Dropout(dropout_ratio=0.25),
        rm.Flatten(),
        rm.Dense(512),
        rm.Relu(),
        rm.Dropout(dropout_ratio=0.5),
        rm.Dense(10),
    ])

Instantiation

In [6]:
# Choose neural network.
#network = Cifar10()
network = sequential
optimizer = Adam()

Training loop

In the training loop, we recommend running a validation step for each minibatch. This will allow us to check the learning process, and prevent overfitting. This also allows you to diagnose training problems by comparing the validation and training learning curves.

In [7]:
# Hyper parameters
batch = 128
epoch = 20

learning_curve = []
test_learning_curve = []

for i in range(epoch):
    perm = np.random.permutation(N)
    loss = 0
    for j in range(0, N // batch):
        train_batch = train_x[perm[j * batch:(j + 1) * batch]]
        responce_batch = labels_train[perm[j * batch:(j + 1) * batch]]

        # Loss function
        network.set_models(inference=False)
        with network.train():
            l = rm.softmax_cross_entropy(network(train_batch), responce_batch)

        # Back propagation
        grad = l.grad()

        # Update
        grad.update(optimizer)
        loss += l.as_ndarray()

    train_loss = loss / (N // batch)

    # Validation
    test_loss = 0
    M = len(test_x)
    network.set_models(inference=True)
    for j in range(M//batch):
        test_batch = test_x[j * batch:(j + 1) * batch]
        test_label_batch = labels_test[j * batch:(j + 1) * batch]
        prediction = network(test_batch)
        test_loss += rm.softmax_cross_entropy(prediction, test_label_batch).as_ndarray()
    test_loss /= (j+1)

    test_learning_curve.append(test_loss)
    learning_curve.append(train_loss)
    print("epoch %03d train_loss:%f test_loss:%f"%(i, train_loss, test_loss))
epoch 000 train_loss:1.562733 test_loss:1.274054
epoch 001 train_loss:1.266876 test_loss:1.160762
epoch 002 train_loss:1.160666 test_loss:1.098638
epoch 003 train_loss:1.098712 test_loss:1.064242
epoch 004 train_loss:1.047783 test_loss:1.001854
epoch 005 train_loss:1.010702 test_loss:1.019032
epoch 006 train_loss:0.975439 test_loss:0.969194
epoch 007 train_loss:0.945626 test_loss:0.967939
epoch 008 train_loss:0.925219 test_loss:0.940706
epoch 009 train_loss:0.900114 test_loss:0.937168
epoch 010 train_loss:0.882823 test_loss:0.920384
epoch 011 train_loss:0.862696 test_loss:0.978997
epoch 012 train_loss:0.845808 test_loss:0.915230
epoch 013 train_loss:0.838032 test_loss:0.920868
epoch 014 train_loss:0.820599 test_loss:0.901738
epoch 015 train_loss:0.805465 test_loss:0.897759
epoch 016 train_loss:0.792367 test_loss:0.893579
epoch 017 train_loss:0.778425 test_loss:0.906781
epoch 018 train_loss:0.773982 test_loss:0.906565
epoch 019 train_loss:0.759154 test_loss:0.893892

Model evaluation

Finally, we evaluate the models labeling performance using the same metrics as in Tutorial 1.

In [8]:
network.set_models(inference=True)
predictions = np.argmax(network(test_x).as_ndarray(), axis=1)

# Confusion matrix and classification report.
print(confusion_matrix(test_y, predictions))
print(classification_report(test_y, predictions))

# Learning curve.
plt.plot(learning_curve, linewidth=3, label="train")
plt.plot(test_learning_curve, linewidth=3, label="test")
plt.title("Learning curve")
plt.ylabel("error")
plt.xlabel("epoch")
plt.legend()
plt.grid()
plt.show()
[[728  14  46  35   9   6  23  14  85  40]
 [ 22 732  10  22   1   3  11   1  37 161]
 [ 65   2 535  98  78  72  77  47  11  15]
 [ 17   3  55 580  44 166  85  25   8  17]
 [ 20   3  87 109 538  39 113  76  10   5]
 [ 13   2  50 227  31 577  38  45   7  10]
 [  5   3  32  73  19  19 835   3   2   9]
 [ 10   3  28  67  41  68  13 750   3  17]
 [ 51  28  23  20   3   3  15   3 819  35]
 [ 37  44   5  25   6   8  13  20  30 812]]
             precision    recall  f1-score   support

          0       0.75      0.73      0.74      1000
          1       0.88      0.73      0.80      1000
          2       0.61      0.54      0.57      1000
          3       0.46      0.58      0.51      1000
          4       0.70      0.54      0.61      1000
          5       0.60      0.58      0.59      1000
          6       0.68      0.83      0.75      1000
          7       0.76      0.75      0.76      1000
          8       0.81      0.82      0.81      1000
          9       0.72      0.81      0.77      1000

avg / total       0.70      0.69      0.69     10000

../../../_images/notebooks_basic_batch_normalization_notebook_16_1.png