Trainer

An introduction of trainer function. Trainer function simplify training loop.

In this tutorial, we use cifar10 dataset and build a CNN model same as tutorial 2 but we use trainer function.

Requirements

This tutorial requires following modules.

The module tqdm is required to display progressbers of trainer function.

In [1]:
from __future__ import division, print_function
import os
import sys
import pickle
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import confusion_matrix, classification_report

import renom as rm
from renom.utility.trainer import Trainer
from renom.utility.distributor import NdarrayDistributor
from renom.cuda.cuda import set_cuda_active
set_cuda_active(True)

Prepare Dataset

Same as tutorial 2, download Cifar10 dataset from following web site and unpickle the data.

As a preprocessing, reshape the input image data to shape of (32, 32) and rescale its pixel value to 0 ~ 1.

In [2]:
dir = "../../dataset/cifar-10-batches-py/"
paths = ["data_batch_1", "data_batch_2", "data_batch_3",
         "data_batch_4", "data_batch_5"]

def unpickle(f):
    fo = open(f, 'rb')
    if sys.version_info.major == 2:
        # Python 2.7
        d = pickle.load(fo)
    elif sys.version_info.major == 3:
        # Python 3.4
        d = pickle.load(fo, encoding="latin-1")
    fo.close()
    return d

# Load train data.
data = list(map(unpickle, [os.path.join(dir, p) for p in paths]))
train_x = np.vstack([d["data"] for d in data])
train_y = np.vstack([d["labels"] for d in data])

# Load test data.
data = unpickle(os.path.join(dir, "test_batch"))
test_x = np.array(data["data"])
test_y = np.array(data["labels"])

# Rehsape and rescale image.
train_x = train_x.reshape(-1, 3, 32, 32)
train_y = train_y.reshape(-1, 1)
test_x = test_x.reshape(-1, 3, 32, 32)
test_y = test_y.reshape(-1, 1)

train_x = train_x / 255.
test_x = test_x / 255.

# Binalize
labels_train = LabelBinarizer().fit_transform(train_y)
labels_test = LabelBinarizer().fit_transform(test_y)

# Change types.
train_x = train_x.astype(np.float32)
test_x = test_x.astype(np.float32)
labels_train = labels_train.astype(np.float32)
labels_test = labels_test.astype(np.float32)

N = len(train_x)

Model Definition

We build CNN model using sequential model.

In [3]:
sequential = rm.Sequential([
        rm.Conv2d(channel=32),
        rm.Relu(),
        rm.Conv2d(channel=32),
        rm.Relu(),
        rm.MaxPool2d(filter=2, stride=2),
        rm.Dropout(dropout_ratio=0.25),
        rm.Conv2d(channel=64),
        rm.Relu(),
        rm.Conv2d(channel=64),
        rm.Relu(),
        rm.MaxPool2d(filter=2, stride=2),
        rm.Dropout(dropout_ratio=0.25),
        rm.Flatten(),
        rm.Dense(512),
        rm.Relu(),
        rm.Dropout(dropout_ratio=0.5),
        rm.Dense(10),
    ])

Define Trainer

Here we instantiate a trainer object. For instantiation of a trainer, model, number of epoch, batch size, loss function and optimizer are required.

In [4]:
trainer = Trainer(sequential,
                  num_epoch=20,
                  batch_size=128,
                  loss_func=rm.softmax_cross_entropy,
                  optimizer=rm.Sgd(lr=0.01, momentum=0.9))

Distributor

Distributor class has a role of yielding datas. Its method ‘batch()’ is an gererator witch yields training data and target data.

In [5]:
dist = NdarrayDistributor(train_x, labels_train)
x, y = dist.batch(32).__next__()
print(x.shape, y.shape)
(32, 3, 32, 32) (32, 10)

Execute Trainer

For executing train loop, we call train method of trainer object. The train method requires train datas. Train data can be passed as an object of NdarrayDistributor.

In [6]:
trainer.train(train_distributor=NdarrayDistributor(train_x, labels_train),
              test_distributor=NdarrayDistributor(test_x, labels_test))
epoch  0: avg loss 2.0890: avg test loss 1.8920: 391it [00:06, 60.63it/s]
epoch  1: avg loss 1.7415: avg test loss 1.5485: 391it [00:05, 37.03it/s]
epoch  2: avg loss 1.5378: avg test loss 1.4753: 391it [00:05, 70.94it/s]
epoch  3: avg loss 1.4171: avg test loss 1.2737: 391it [00:05, 66.87it/s]
epoch  4: avg loss 1.3338: avg test loss 1.1792: 391it [00:05, 66.66it/s]
epoch  5: avg loss 1.2536: avg test loss 1.1520: 391it [00:05, 66.68it/s]
epoch  6: avg loss 1.1969: avg test loss 1.1103: 391it [00:05, 67.11it/s]
epoch  7: avg loss 1.1356: avg test loss 1.0357: 391it [00:05, 66.46it/s]
epoch  8: avg loss 1.1005: avg test loss 0.9990: 391it [00:05, 37.14it/s]
epoch  9: avg loss 1.0541: avg test loss 0.9914: 391it [00:05, 66.00it/s]
epoch 10: avg loss 1.0156: avg test loss 0.9330: 391it [00:05, 66.10it/s]
epoch 11: avg loss 0.9797: avg test loss 0.9099: 391it [00:05, 66.43it/s]
epoch 12: avg loss 0.9424: avg test loss 0.8872: 391it [00:05, 65.99it/s]
epoch 13: avg loss 0.9198: avg test loss 0.8913: 391it [00:05, 70.27it/s]
epoch 14: avg loss 0.8872: avg test loss 0.8663: 391it [00:05, 66.33it/s]
epoch 15: avg loss 0.8678: avg test loss 0.8506: 391it [00:05, 70.17it/s]
epoch 16: avg loss 0.8347: avg test loss 0.8183: 391it [00:05, 66.15it/s]
epoch 17: avg loss 0.8189: avg test loss 0.8108: 391it [00:05, 70.26it/s]
epoch 18: avg loss 0.7985: avg test loss 0.8110: 391it [00:05, 66.19it/s]
epoch 19: avg loss 0.7725: avg test loss 0.8222: 391it [00:05, 66.59it/s]

Callbacks

You can set some callback functions.

The trainer object has following attiutes. These attributes are accessible from callback functions.

attributes description
epoch Number of current epoch
nth Number of current batch
model Model
optimizer Optimizer
data Training data
target Target data
loss Train loss
grads Grad object
avg_train_loss Average loss through one epoch
loss_func Loss function

Following codes are examples of callback functions. For setting callback, you put a decorator to the function.

In [7]:
# Called when train starts.
@trainer.events.start
def event_start(trainer):
    print("# Called start training aa")

# Called when each epoch starts.
@trainer.events.start_epoch
def event_start_epoch(trainer):
    print("============================")
    print("# Called start %dth epoch"%(trainer.epoch))

# Called before forward propagation executed.
@trainer.events.forward
def event_forward(trainer):
    if trainer.nth %100 == 0:
        print("----------------------------")
        print("# Called forward  %dth batch"%(trainer.nth))

# Called before back propagation executed.
@trainer.events.backward
def event_backward(trainer):
    if trainer.nth %100 == 0:
        print("# Called backward %dth batch"%(trainer.nth))

# Called after weight parameter uptate executed.
@trainer.events.updated
def event_updated(trainer):
    if trainer.nth %100 == 0:
        print("# Called updated  %dth batch"%(trainer.nth))

# Called end of each epoch.
@trainer.events.end_epoch
def event_end_epoch(trainer):
    print("----------------------------")
    print("# Called end %dth epoch"%(trainer.epoch))

Execute Trainer

Now we execute train() again. We can confirme functions registered above are called.

In [8]:
trainer.num_epoch = 2
trainer.train(train_distributor=NdarrayDistributor(train_x, labels_train),
              test_distributor=NdarrayDistributor(test_x, labels_test))
# Called start training aa
============================
# Called start 0th epoch
----------------------------
# Called forward  0th batch
# Called backward 0th batch
# Called updated  0th batch
----------------------------
# Called forward  100th batch
# Called backward 100th batch
# Called updated  100th batch
----------------------------
# Called forward  200th batch
# Called backward 200th batch
# Called updated  200th batch
----------------------------
# Called forward  300th batch
# Called backward 300th batch
# Called updated  300th batch
----------------------------
# Called end 0th epoch
============================
# Called start 1th epoch
----------------------------
# Called forward  0th batch
# Called backward 0th batch
# Called updated  0th batch
----------------------------
# Called forward  100th batch
# Called backward 100th batch
# Called updated  100th batch
----------------------------
# Called forward  200th batch
# Called backward 200th batch
# Called updated  200th batch
----------------------------
# Called forward  300th batch
# Called backward 300th batch
# Called updated  300th batch
----------------------------
# Called end 1th epoch

Test learned model

For testing learned model, test() method can be used.

In [9]:
predictions = np.argmax(trainer.test(test_x), axis=1)

# Confusion matrix and classification report.
print(confusion_matrix(test_y, predictions))
print(classification_report(test_y, predictions))
[[752  26  30   9  16   4   9   8  94  52]
 [ 12 870   0   3   1   0   6   3  10  95]
 [ 86  10 559  54 102  43  79  24  24  19]
 [ 36  20  62 466  80 121  91  42  32  50]
 [ 19   4  54  36 683  15  60 104  16   9]
 [ 14   7  44 188  63 553  48  54  10  19]
 [  5   9  24  37  26   7 862   9   9  12]
 [ 17   5  20  28  44  35  14 806   2  29]
 [ 40  32   4   4   5   5   9   3 864  34]
 [ 25  82   6   7   3   1   6   8  20 842]]
             precision    recall  f1-score   support

          0       0.75      0.75      0.75      1000
          1       0.82      0.87      0.84      1000
          2       0.70      0.56      0.62      1000
          3       0.56      0.47      0.51      1000
          4       0.67      0.68      0.68      1000
          5       0.71      0.55      0.62      1000
          6       0.73      0.86      0.79      1000
          7       0.76      0.81      0.78      1000
          8       0.80      0.86      0.83      1000
          9       0.73      0.84      0.78      1000

avg / total       0.72      0.73      0.72     10000