Stochastic Gradient Descent(SGD) Settings

Effects of Stochastic gradient descent settings

Sgd is a optimizing network weight technique, and has mainly two parameter, learning rate and momentum.
These settings are important to minimize the loss function.
So firstly, we showed below the relationship between loss function and optimizer.

The reference of the dataset is below.

ISOLET Data Set, Ron Cole and Mark Fanty. Department of Computer Science and Engineering,
Oregon Graduate Institute, Beaverton, OR 97006.
In [1]:
from glob import glob
import numpy as np
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import renom as rm
from renom.utility.trainer import Trainer, default_event_end_epoch
from renom.utility.distributor import NdarrayDistributor
from renom.cuda.cuda import set_cuda_active
set_cuda_active(False)

Make label data

In [2]:
filename = "./isolet1+2+3+4.data"
labels = []
X = []
y = []

def make_label_idx(filename):
    labels = []
    for line in open(filename, "r"):
        line = line.rstrip()
        label = line.split(",")[-1]
        labels.append(label)
    labels = list(set(labels))
    return list(set(labels))

labels = make_label_idx(filename)
labels = sorted(labels, key=lambda d:int(d.replace(".","").replace(" ","")))

Load data from the file for training and prediction

In [3]:
for line in open(filename,"r"):
    line = line.rstrip()
    label = labels.index(line.split(",")[-1])
    features = list(map(float,line.split(",")[:-1]))
    X.append(features)
    y.append(label)

X = np.array(X)
y = np.array(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
print("X_train:{}, y_train:{}, X_test:{}, y_test:{}".format(X_train.shape, y_train.shape,
                                                            X_test.shape, y_test.shape))
lb = LabelBinarizer().fit(y)
labels_train = lb.transform(y_train)
labels_test = lb.transform(y_test)
print("labels_train:{}, labels_test:{}".format(labels_train.shape, labels_test.shape))
X_train:(4990, 617), y_train:(4990,), X_test:(1248, 617), y_test:(1248,)
labels_train:(4990, 26), labels_test:(1248, 26)

Network definition and initialize parameters

In [4]:
output_size = len(labels)
sequential = rm.Sequential([
    rm.Dense(100),
    rm.Relu(),
    rm.Dense(50),
    rm.Relu(),
    rm.Dense(output_size)
])

Setting of the learning rate

At first, We demonsrate the effect of the learning rate.
The problems described as follows.

In source code, Optimizer settings written in following part.

optimizer=rm.Sgd(lr=0.001, momentum=0.0)

This time, we’ll see too small learning rate case.

In [8]:
trainer = Trainer(model=sequential,
                  num_epoch=20,
                  batch_size=128,
                  shuffle=True,
                  loss_func=rm.softmax_cross_entropy,
                  optimizer=rm.Sgd(lr=0.01, momentum=0.0))

trainer.train(train_distributor=NdarrayDistributor(X_train, labels_train),
              test_distributor=NdarrayDistributor(X_test, labels_test))
epoch  0: avg loss 2.5705: avg test loss 2.4563: 39it [00:00, 71.51it/s]
epoch  1: avg loss 2.2928: avg test loss 2.1781: 39it [00:00, 127.40it/s]
epoch  2: avg loss 2.0119: avg test loss 1.9027: 39it [00:00, 189.11it/s]
epoch  3: avg loss 1.7486: avg test loss 1.6588: 39it [00:00, 176.87it/s]
epoch  4: avg loss 1.5210: avg test loss 1.4545: 39it [00:00, 189.89it/s]
epoch  5: avg loss 1.3337: avg test loss 1.2922: 39it [00:00, 161.71it/s]
epoch  6: avg loss 1.1825: avg test loss 1.1519: 39it [00:00, 185.84it/s]
epoch  7: avg loss 1.0625: avg test loss 1.0475: 39it [00:00, 194.21it/s]
epoch  8: avg loss 0.9655: avg test loss 0.9641: 39it [00:00, 185.89it/s]
epoch  9: avg loss 0.8858: avg test loss 0.8908: 39it [00:00, 186.09it/s]
epoch 10: avg loss 0.8192: avg test loss 0.8254: 39it [00:00, 188.29it/s]
epoch 11: avg loss 0.7643: avg test loss 0.7735: 39it [00:00, 181.22it/s]
epoch 12: avg loss 0.7149: avg test loss 0.7287: 39it [00:00, 180.42it/s]
epoch 13: avg loss 0.6735: avg test loss 0.6974: 39it [00:00, 186.97it/s]
epoch 14: avg loss 0.6359: avg test loss 0.6554: 39it [00:00, 188.58it/s]
epoch 15: avg loss 0.6039: avg test loss 0.6230: 39it [00:00, 177.28it/s]
epoch 16: avg loss 0.5742: avg test loss 0.5990: 39it [00:00, 133.24it/s]
epoch 17: avg loss 0.5481: avg test loss 0.5693: 39it [00:00, 158.45it/s]
epoch 18: avg loss 0.5239: avg test loss 0.5461: 39it [00:00, 161.07it/s]
epoch 19: avg loss 0.5033: avg test loss 0.5245: 39it [00:00, 163.58it/s]

Network definition and initialize parameters

Next time, we’ll see too learning learning rate case.

In [6]:
output_size = len(labels)
sequential = rm.Sequential([
    rm.Dense(100),
    rm.Relu(),
    rm.Dense(50),
    rm.Relu(),
    rm.Dense(output_size)
])

trainer = Trainer(model=sequential,
                  num_epoch=20,
                  batch_size=128,
                  shuffle=True,
                  loss_func=rm.softmax_cross_entropy,
                  optimizer=rm.Sgd(lr=0.5, momentum=0.0))
trainer.train(train_distributor=NdarrayDistributor(X_train, labels_train),
              test_distributor=NdarrayDistributor(X_test, labels_test))
epoch  0: avg loss 2.4769: avg test loss 5.1215: 39it [00:00, 190.08it/s]
epoch  1: avg loss 2.3430: avg test loss 1.5031: 39it [00:00, 193.89it/s]
epoch  2: avg loss 1.1215: avg test loss 0.7088: 39it [00:00, 189.75it/s]
epoch  3: avg loss 0.6896: avg test loss 0.6993: 39it [00:00, 195.12it/s]
epoch  4: avg loss 0.6912: avg test loss 0.5989: 39it [00:00, 193.02it/s]
epoch  5: avg loss 0.6340: avg test loss 0.4674: 39it [00:00, 118.82it/s]
epoch  6: avg loss 0.4619: avg test loss 0.3128: 39it [00:00, 135.87it/s]
epoch  7: avg loss 0.3572: avg test loss 0.2458: 39it [00:00, 126.80it/s]
epoch  8: avg loss 0.2623: avg test loss 0.2436: 39it [00:00, 178.96it/s]
epoch  9: avg loss 0.2080: avg test loss 0.2934: 39it [00:00, 190.40it/s]
epoch 10: avg loss 0.2877: avg test loss 0.2199: 39it [00:00, 190.40it/s]
epoch 11: avg loss 0.1507: avg test loss 0.2148: 39it [00:00, 188.13it/s]
epoch 12: avg loss 0.1376: avg test loss 0.1707: 39it [00:00, 181.45it/s]
epoch 13: avg loss 0.1435: avg test loss 0.1547: 39it [00:00, 181.11it/s]
epoch 14: avg loss 0.1901: avg test loss 0.1513: 39it [00:00, 183.26it/s]
epoch 15: avg loss 0.1051: avg test loss 0.1819: 39it [00:00, 189.83it/s]
epoch 16: avg loss 0.0704: avg test loss 0.1878: 39it [00:00, 185.68it/s]
epoch 17: avg loss 0.1579: avg test loss 0.1611: 39it [00:00, 196.28it/s]
epoch 18: avg loss 0.1460: avg test loss 0.1463: 39it [00:00, 120.99it/s]
epoch 19: avg loss 5.1569: avg test loss 3.2725: 39it [00:00, 153.28it/s]

A momentum setting

Next we’ll see about a memontum setting.
Momentum effects is as follows.
In [9]:
output_size = len(labels)
sequential = rm.Sequential([
    rm.Dense(100),
    rm.Relu(),
    rm.Dense(50),
    rm.Relu(),
    rm.Dense(output_size)
])

trainer = Trainer(model=sequential,
                  num_epoch=20,
                  batch_size=128,
                  shuffle=True,
                  loss_func=rm.softmax_cross_entropy,
                  optimizer=rm.Sgd(lr=0.01, momentum=0.4))
trainer.train(train_distributor=NdarrayDistributor(X_train, labels_train),
              test_distributor=NdarrayDistributor(X_test, labels_test))
epoch  0: avg loss 3.0649: avg test loss 2.8258: 39it [00:00, 102.20it/s]
epoch  1: avg loss 2.5867: avg test loss 2.3742: 39it [00:00, 139.06it/s]
epoch  2: avg loss 2.1485: avg test loss 1.9562: 39it [00:00, 143.92it/s]
epoch  3: avg loss 1.7408: avg test loss 1.5746: 39it [00:00, 103.48it/s]
epoch  4: avg loss 1.4175: avg test loss 1.3063: 39it [00:00, 118.26it/s]
epoch  5: avg loss 1.1841: avg test loss 1.1168: 39it [00:00, 109.36it/s]
epoch  6: avg loss 1.0179: avg test loss 0.9786: 39it [00:00, 101.90it/s]
epoch  7: avg loss 0.8955: avg test loss 0.8810: 39it [00:00, 148.43it/s]
epoch  8: avg loss 0.8024: avg test loss 0.7913: 39it [00:00, 132.78it/s]
epoch  9: avg loss 0.7292: avg test loss 0.7294: 39it [00:00, 186.64it/s]
epoch 10: avg loss 0.6703: avg test loss 0.6728: 39it [00:00, 63.74it/s]
epoch 11: avg loss 0.6203: avg test loss 0.6303: 39it [00:00, 73.07it/s]
epoch 12: avg loss 0.5799: avg test loss 0.5898: 39it [00:00, 79.56it/s]
epoch 13: avg loss 0.5427: avg test loss 0.5506: 39it [00:00, 63.39it/s]
epoch 14: avg loss 0.5101: avg test loss 0.5205: 39it [00:00, 67.52it/s]
epoch 15: avg loss 0.4833: avg test loss 0.4998: 39it [00:00, 96.84it/s]
epoch 16: avg loss 0.4586: avg test loss 0.4751: 39it [00:00, 59.55it/s]
epoch 17: avg loss 0.4363: avg test loss 0.4569: 39it [00:00, 208.80it/s]
epoch 18: avg loss 0.4170: avg test loss 0.4417: 39it [00:00, 148.35it/s]
epoch 19: avg loss 0.3993: avg test loss 0.4202: 39it [00:00, 140.92it/s]