# Stochastic Gradient Descent(SGD) Settings ¶

Effects of Stochastic gradient descent settings

Sgd is a optimizing network weight technique, and has mainly two parameter, learning rate and momentum.
These settings are important to minimize the loss function.
So firstly, we showed below the relationship between loss function and optimizer.

The reference of the dataset is below.

ISOLET Data Set, Ron Cole and Mark Fanty. Department of Computer Science and Engineering,
Oregon Graduate Institute, Beaverton, OR 97006.

## Required Libraries ¶

• matplotlib 2.0.2
• numpy 1.12.1
• scikit-learn 0.18.2
• glob2 0.6
In [1]:

from glob import glob
import numpy as np
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import renom as rm
from renom.utility.trainer import Trainer, default_event_end_epoch
from renom.utility.distributor import NdarrayDistributor
from renom.cuda.cuda import set_cuda_active
set_cuda_active(False)


## Make label data ¶

In [2]:

filename = "./isolet1+2+3+4.data"
labels = []
X = []
y = []

def make_label_idx(filename):
labels = []
for line in open(filename, "r"):
line = line.rstrip()
label = line.split(",")[-1]
labels.append(label)
labels = list(set(labels))
return list(set(labels))

labels = make_label_idx(filename)
labels = sorted(labels, key=lambda d:int(d.replace(".","").replace(" ","")))


## Load data from the file for training and prediction ¶

In [3]:

for line in open(filename,"r"):
line = line.rstrip()
label = labels.index(line.split(",")[-1])
features = list(map(float,line.split(",")[:-1]))
X.append(features)
y.append(label)

X = np.array(X)
y = np.array(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
print("X_train:{}, y_train:{}, X_test:{}, y_test:{}".format(X_train.shape, y_train.shape,
X_test.shape, y_test.shape))
lb = LabelBinarizer().fit(y)
labels_train = lb.transform(y_train)
labels_test = lb.transform(y_test)
print("labels_train:{}, labels_test:{}".format(labels_train.shape, labels_test.shape))

X_train:(4990, 617), y_train:(4990,), X_test:(1248, 617), y_test:(1248,)
labels_train:(4990, 26), labels_test:(1248, 26)


## Network definition and initialize parameters ¶

In [4]:

output_size = len(labels)
sequential = rm.Sequential([
rm.Dense(100),
rm.Relu(),
rm.Dense(50),
rm.Relu(),
rm.Dense(output_size)
])


## Setting of the learning rate ¶

At first, We demonsrate the effect of the learning rate.
The problems described as follows.

In source code, Optimizer settings written in following part.

optimizer=rm.Sgd(lr=0.001, momentum=0.0)


This time, we'll see too small learning rate case.

In [8]:

trainer = Trainer(model=sequential,
num_epoch=20,
batch_size=128,
shuffle=True,
loss_func=rm.softmax_cross_entropy,
optimizer=rm.Sgd(lr=0.01, momentum=0.0))

trainer.train(train_distributor=NdarrayDistributor(X_train, labels_train),
test_distributor=NdarrayDistributor(X_test, labels_test))

epoch  0: avg loss 2.5705: avg test loss 2.4563: 39it [00:00, 71.51it/s]
epoch  1: avg loss 2.2928: avg test loss 2.1781: 39it [00:00, 127.40it/s]
epoch  2: avg loss 2.0119: avg test loss 1.9027: 39it [00:00, 189.11it/s]
epoch  3: avg loss 1.7486: avg test loss 1.6588: 39it [00:00, 176.87it/s]
epoch  4: avg loss 1.5210: avg test loss 1.4545: 39it [00:00, 189.89it/s]
epoch  5: avg loss 1.3337: avg test loss 1.2922: 39it [00:00, 161.71it/s]
epoch  6: avg loss 1.1825: avg test loss 1.1519: 39it [00:00, 185.84it/s]
epoch  7: avg loss 1.0625: avg test loss 1.0475: 39it [00:00, 194.21it/s]
epoch  8: avg loss 0.9655: avg test loss 0.9641: 39it [00:00, 185.89it/s]
epoch  9: avg loss 0.8858: avg test loss 0.8908: 39it [00:00, 186.09it/s]
epoch 10: avg loss 0.8192: avg test loss 0.8254: 39it [00:00, 188.29it/s]
epoch 11: avg loss 0.7643: avg test loss 0.7735: 39it [00:00, 181.22it/s]
epoch 12: avg loss 0.7149: avg test loss 0.7287: 39it [00:00, 180.42it/s]
epoch 13: avg loss 0.6735: avg test loss 0.6974: 39it [00:00, 186.97it/s]
epoch 14: avg loss 0.6359: avg test loss 0.6554: 39it [00:00, 188.58it/s]
epoch 15: avg loss 0.6039: avg test loss 0.6230: 39it [00:00, 177.28it/s]
epoch 16: avg loss 0.5742: avg test loss 0.5990: 39it [00:00, 133.24it/s]
epoch 17: avg loss 0.5481: avg test loss 0.5693: 39it [00:00, 158.45it/s]
epoch 18: avg loss 0.5239: avg test loss 0.5461: 39it [00:00, 161.07it/s]
epoch 19: avg loss 0.5033: avg test loss 0.5245: 39it [00:00, 163.58it/s]


## Network definition and initialize parameters ¶

Next time, we'll see too learning learning rate case.

In [6]:

output_size = len(labels)
sequential = rm.Sequential([
rm.Dense(100),
rm.Relu(),
rm.Dense(50),
rm.Relu(),
rm.Dense(output_size)
])

trainer = Trainer(model=sequential,
num_epoch=20,
batch_size=128,
shuffle=True,
loss_func=rm.softmax_cross_entropy,
optimizer=rm.Sgd(lr=0.5, momentum=0.0))
trainer.train(train_distributor=NdarrayDistributor(X_train, labels_train),
test_distributor=NdarrayDistributor(X_test, labels_test))

epoch  0: avg loss 2.4769: avg test loss 5.1215: 39it [00:00, 190.08it/s]
epoch  1: avg loss 2.3430: avg test loss 1.5031: 39it [00:00, 193.89it/s]
epoch  2: avg loss 1.1215: avg test loss 0.7088: 39it [00:00, 189.75it/s]
epoch  3: avg loss 0.6896: avg test loss 0.6993: 39it [00:00, 195.12it/s]
epoch  4: avg loss 0.6912: avg test loss 0.5989: 39it [00:00, 193.02it/s]
epoch  5: avg loss 0.6340: avg test loss 0.4674: 39it [00:00, 118.82it/s]
epoch  6: avg loss 0.4619: avg test loss 0.3128: 39it [00:00, 135.87it/s]
epoch  7: avg loss 0.3572: avg test loss 0.2458: 39it [00:00, 126.80it/s]
epoch  8: avg loss 0.2623: avg test loss 0.2436: 39it [00:00, 178.96it/s]
epoch  9: avg loss 0.2080: avg test loss 0.2934: 39it [00:00, 190.40it/s]
epoch 10: avg loss 0.2877: avg test loss 0.2199: 39it [00:00, 190.40it/s]
epoch 11: avg loss 0.1507: avg test loss 0.2148: 39it [00:00, 188.13it/s]
epoch 12: avg loss 0.1376: avg test loss 0.1707: 39it [00:00, 181.45it/s]
epoch 13: avg loss 0.1435: avg test loss 0.1547: 39it [00:00, 181.11it/s]
epoch 14: avg loss 0.1901: avg test loss 0.1513: 39it [00:00, 183.26it/s]
epoch 15: avg loss 0.1051: avg test loss 0.1819: 39it [00:00, 189.83it/s]
epoch 16: avg loss 0.0704: avg test loss 0.1878: 39it [00:00, 185.68it/s]
epoch 17: avg loss 0.1579: avg test loss 0.1611: 39it [00:00, 196.28it/s]
epoch 18: avg loss 0.1460: avg test loss 0.1463: 39it [00:00, 120.99it/s]
epoch 19: avg loss 5.1569: avg test loss 3.2725: 39it [00:00, 153.28it/s]


## A momentum setting ¶

Next we'll see about a memontum setting.
Momentum effects is as follows.
In [9]:

output_size = len(labels)
sequential = rm.Sequential([
rm.Dense(100),
rm.Relu(),
rm.Dense(50),
rm.Relu(),
rm.Dense(output_size)
])

trainer = Trainer(model=sequential,
num_epoch=20,
batch_size=128,
shuffle=True,
loss_func=rm.softmax_cross_entropy,
optimizer=rm.Sgd(lr=0.01, momentum=0.4))
trainer.train(train_distributor=NdarrayDistributor(X_train, labels_train),
test_distributor=NdarrayDistributor(X_test, labels_test))

epoch  0: avg loss 3.0649: avg test loss 2.8258: 39it [00:00, 102.20it/s]
epoch  1: avg loss 2.5867: avg test loss 2.3742: 39it [00:00, 139.06it/s]
epoch  2: avg loss 2.1485: avg test loss 1.9562: 39it [00:00, 143.92it/s]
epoch  3: avg loss 1.7408: avg test loss 1.5746: 39it [00:00, 103.48it/s]
epoch  4: avg loss 1.4175: avg test loss 1.3063: 39it [00:00, 118.26it/s]
epoch  5: avg loss 1.1841: avg test loss 1.1168: 39it [00:00, 109.36it/s]
epoch  6: avg loss 1.0179: avg test loss 0.9786: 39it [00:00, 101.90it/s]
epoch  7: avg loss 0.8955: avg test loss 0.8810: 39it [00:00, 148.43it/s]
epoch  8: avg loss 0.8024: avg test loss 0.7913: 39it [00:00, 132.78it/s]
epoch  9: avg loss 0.7292: avg test loss 0.7294: 39it [00:00, 186.64it/s]
epoch 10: avg loss 0.6703: avg test loss 0.6728: 39it [00:00, 63.74it/s]
epoch 11: avg loss 0.6203: avg test loss 0.6303: 39it [00:00, 73.07it/s]
epoch 12: avg loss 0.5799: avg test loss 0.5898: 39it [00:00, 79.56it/s]
epoch 13: avg loss 0.5427: avg test loss 0.5506: 39it [00:00, 63.39it/s]
epoch 14: avg loss 0.5101: avg test loss 0.5205: 39it [00:00, 67.52it/s]
epoch 15: avg loss 0.4833: avg test loss 0.4998: 39it [00:00, 96.84it/s]
epoch 16: avg loss 0.4586: avg test loss 0.4751: 39it [00:00, 59.55it/s]
epoch 17: avg loss 0.4363: avg test loss 0.4569: 39it [00:00, 208.80it/s]
epoch 18: avg loss 0.4170: avg test loss 0.4417: 39it [00:00, 148.35it/s]
epoch 19: avg loss 0.3993: avg test loss 0.4202: 39it [00:00, 140.92it/s]