Trainer

An introduction of trainer function. Trainer function simplify training loop.

We think that it is understandable for us to write the training loop, because we don’t understand sometimes where the error messasge is occurred.
But, when you write the model simply, it is helpful to use trainer .
In this tutorial, we use cifar10 dataset and build a CNN model, this time we use the trainer.

Requirements

This tutorial requires following modules.

The module tqdm is required to display progressbers of trainer function.

In [1]:
from __future__ import division, print_function
import os
import sys
import pickle
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import confusion_matrix, classification_report

import renom as rm
from renom.utility.trainer import Trainer
from renom.utility.distributor import NdarrayDistributor
from renom.cuda.cuda import set_cuda_active
set_cuda_active(True)

Prepare Dataset

Same as tutorial 2, download Cifar10 dataset from following web site and unpickle the data.

As a preprocessing, reshape the input image data to shape of (32, 32) and rescale its pixel value to 0 ~ 1.

In [2]:
dir = "../dataset/cifar-10-batches-py/"
paths = ["data_batch_1", "data_batch_2", "data_batch_3",
         "data_batch_4", "data_batch_5"]

def unpickle(f):
    fo = open(f, 'rb')
    if sys.version_info.major == 2:
        # Python 2.7
        d = pickle.load(fo)
    elif sys.version_info.major == 3:
        # Python 3.4
        d = pickle.load(fo, encoding="latin-1")
    fo.close()
    return d

# Load train data.
data = list(map(unpickle, [os.path.join(dir, p) for p in paths]))
train_x = np.vstack([d["data"] for d in data])
train_y = np.vstack([d["labels"] for d in data])

# Load test data.
data = unpickle(os.path.join(dir, "test_batch"))
test_x = np.array(data["data"])
test_y = np.array(data["labels"])

# Reshape and rescale image.
train_x = train_x.reshape(-1, 3, 32, 32)
train_y = train_y.reshape(-1, 1)
test_x = test_x.reshape(-1, 3, 32, 32)
test_y = test_y.reshape(-1, 1)

train_x = train_x / 255.
test_x = test_x / 255.

# Binalize
labels_train = LabelBinarizer().fit_transform(train_y)
labels_test = LabelBinarizer().fit_transform(test_y)

# Change types.
train_x = train_x.astype(np.float32)
test_x = test_x.astype(np.float32)
labels_train = labels_train.astype(np.float32)
labels_test = labels_test.astype(np.float32)

N = len(train_x)

Model Definition

We build CNN model using sequential model.

In [3]:
sequential = rm.Sequential([
        rm.Conv2d(channel=32),
        rm.Relu(),
        rm.Conv2d(channel=32),
        rm.Relu(),
        rm.MaxPool2d(filter=2, stride=2),
        rm.Dropout(dropout_ratio=0.25),
        rm.Conv2d(channel=64),
        rm.Relu(),
        rm.Conv2d(channel=64),
        rm.Relu(),
        rm.MaxPool2d(filter=2, stride=2),
        rm.Dropout(dropout_ratio=0.25),
        rm.Flatten(),
        rm.Dense(512),
        rm.Relu(),
        rm.Dropout(dropout_ratio=0.5),
        rm.Dense(10),
    ])

Define Trainer

Here we instantiate a trainer object. For instantiation of a trainer, model, number of epoch, batch size, loss function and optimizer are required.

In [4]:
trainer = Trainer(sequential,
                  num_epoch=20,
                  batch_size=128,
                  loss_func=rm.softmax_cross_entropy,
                  optimizer=rm.Sgd(lr=0.01, momentum=0.9))

Distributor

Distributor class has a role of yielding datas. Its method ‘batch()’ is an gererator witch yields training data and target data.

In [5]:
dist = NdarrayDistributor(train_x, labels_train)
x, y = dist.batch(32).__next__()
print(x.shape, y.shape)
(32, 3, 32, 32) (32, 10)

Execute Trainer

For executing train loop, we call train method of trainer object. The train method requires train datas. Train data can be passed as an object of NdarrayDistributor.

In [6]:
trainer.train(train_distributor=NdarrayDistributor(train_x, labels_train),
              test_distributor=NdarrayDistributor(test_x, labels_test))
epoch  0: avg loss 2.0617: avg test loss 1.7642: 391it [00:06, 60.54it/s]
epoch  1: avg loss 1.7560: avg test loss 1.5674: 391it [00:05, 70.90it/s]
epoch  2: avg loss 1.5925: avg test loss 1.4568: 391it [00:05, 70.37it/s]
epoch  3: avg loss 1.4676: avg test loss 1.3682: 391it [00:05, 70.75it/s]
epoch  4: avg loss 1.3856: avg test loss 1.2873: 391it [00:05, 70.62it/s]
epoch  5: avg loss 1.3181: avg test loss 1.2117: 391it [00:05, 70.41it/s]
epoch  6: avg loss 1.2590: avg test loss 1.1685: 391it [00:05, 74.84it/s]
epoch  7: avg loss 1.2111: avg test loss 1.1129: 391it [00:05, 70.52it/s]
epoch  8: avg loss 1.1580: avg test loss 1.0966: 391it [00:05, 70.41it/s]
epoch  9: avg loss 1.1155: avg test loss 1.0392: 391it [00:05, 70.35it/s]
epoch 10: avg loss 1.0817: avg test loss 1.0321: 391it [00:05, 73.48it/s]
epoch 11: avg loss 1.0453: avg test loss 0.9960: 391it [00:05, 68.90it/s]
epoch 12: avg loss 1.0142: avg test loss 0.9336: 391it [00:05, 70.35it/s]
epoch 13: avg loss 0.9813: avg test loss 0.9244: 391it [00:05, 70.38it/s]
epoch 14: avg loss 0.9497: avg test loss 0.9192: 391it [00:05, 74.52it/s]
epoch 15: avg loss 0.9322: avg test loss 0.9023: 391it [00:05, 70.04it/s]
epoch 16: avg loss 0.9025: avg test loss 0.8527: 391it [00:05, 70.05it/s]
epoch 17: avg loss 0.8793: avg test loss 0.8471: 391it [00:05, 69.77it/s]
epoch 18: avg loss 0.8539: avg test loss 0.8612: 391it [00:05, 69.01it/s]
epoch 19: avg loss 0.8323: avg test loss 0.8512: 391it [00:05, 70.06it/s]

Callbacks

You can set some callback functions.

The trainer object has following attiutes. These attributes are accessible from callback functions.

attributes description
epoch Number of current epoch
nth Number of current batch
model Model
optimizer Optimizer
data Training data
target Target data
loss Train loss
grads Grad object
avg_train_loss Average loss through one epoch
loss_func Loss function

Following codes are examples of callback functions. For setting callback, you put a decorator to the function.

In [7]:
# Called when train starts.
@trainer.events.start
def event_start(trainer):
    print("# Called start training aa")

# Called when each epoch starts.
@trainer.events.start_epoch
def event_start_epoch(trainer):
    print("============================")
    print("# Called start %dth epoch"%(trainer.epoch))

# Called before forward propagation executed.
@trainer.events.forward
def event_forward(trainer):
    if trainer.nth %100 == 0:
        print("----------------------------")
        print("# Called forward  %dth batch"%(trainer.nth))

# Called before back propagation executed.
@trainer.events.backward
def event_backward(trainer):
    if trainer.nth %100 == 0:
        print("# Called backward %dth batch"%(trainer.nth))

# Called after weight parameter update executed.
@trainer.events.updated
def event_updated(trainer):
    if trainer.nth %100 == 0:
        print("# Called updated  %dth batch"%(trainer.nth))

# Called end of each epoch.
@trainer.events.end_epoch
def event_end_epoch(trainer):
    print("----------------------------")
    print("# Called end %dth epoch"%(trainer.epoch))

Execute Trainer

Now we execute train() again. We can confirme functions registered above are called.

In [8]:
trainer.num_epoch = 2
trainer.train(train_distributor=NdarrayDistributor(train_x, labels_train),
              test_distributor=NdarrayDistributor(test_x, labels_test))
# Called start training aa
============================
# Called start 0th epoch
----------------------------
# Called forward  0th batch
# Called backward 0th batch
# Called updated  0th batch
----------------------------
# Called forward  100th batch
# Called backward 100th batch
# Called updated  100th batch
----------------------------
# Called forward  200th batch
# Called backward 200th batch
# Called updated  200th batch
----------------------------
# Called forward  300th batch
# Called backward 300th batch
# Called updated  300th batch
----------------------------
# Called end 0th epoch
============================
# Called start 1th epoch
----------------------------
# Called forward  0th batch
# Called backward 0th batch
# Called updated  0th batch
----------------------------
# Called forward  100th batch
# Called backward 100th batch
# Called updated  100th batch
----------------------------
# Called forward  200th batch
# Called backward 200th batch
# Called updated  200th batch
----------------------------
# Called forward  300th batch
# Called backward 300th batch
# Called updated  300th batch
----------------------------
# Called end 1th epoch

Test learned model

For testing learned model, test() method can be used.

In [9]:
predictions = np.argmax(trainer.test(test_x), axis=1)

# Confusion matrix and classification report.
print(confusion_matrix(test_y, predictions))
print(classification_report(test_y, predictions))
[[788  27  42  16  11   1  12   9  46  48]
 [ 11 874   4   3   2   4   7   4  26  65]
 [ 67  10 614  52  88  60  67  22   9  11]
 [ 22  10  82 465  77 188  87  32  16  21]
 [ 26   6  72  54 661  39  62  69   8   3]
 [ 15   6  57 151  51 638  32  35   3  12]
 [  6   5  44  35  38  24 827   6   7   8]
 [  9   3  36  43  64  62  11 745   3  24]
 [ 79  38  16  13   6   8   6   2 807  25]
 [ 24  96   8  14   5   5  12   9  24 803]]
             precision    recall  f1-score   support

          0       0.75      0.79      0.77      1000
          1       0.81      0.87      0.84      1000
          2       0.63      0.61      0.62      1000
          3       0.55      0.47      0.50      1000
          4       0.66      0.66      0.66      1000
          5       0.62      0.64      0.63      1000
          6       0.74      0.83      0.78      1000
          7       0.80      0.74      0.77      1000
          8       0.85      0.81      0.83      1000
          9       0.79      0.80      0.80      1000

avg / total       0.72      0.72      0.72     10000