確率的勾配降下法(SGD)の設定

確率的勾配降下法の設定の影響

Sgdはネットワークの重みの最適化の手法であり, 主に学習率とモメンタムの2つのパラメータが主にあります.
これらの設定は損失関数の最小化をする上で重要になります.
まず初めに以下に損失関数と最適化の関連性について以下に示します.

使用したデータセットのリファレンスを以下に示します.

ISOLET Data Set, Ron Cole and Mark Fanty. Department of Computer Science and Engineering,
Oregon Graduate Institute, Beaverton, OR 97006.

Required Libraries

  • matplotlib 2.0.2
  • numpy 1.12.1
  • scikit-learn 0.18.2
  • glob2 0.6
In [1]:
from glob import glob
import numpy as np
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import renom as rm
from renom.utility.trainer import Trainer, default_event_end_epoch
from renom.utility.distributor import NdarrayDistributor
from renom.cuda.cuda import set_cuda_active
set_cuda_active(False)

ラベルデータの作成

In [2]:
filename = "./isolet1+2+3+4.data"
labels = []
X = []
y = []

def make_label_idx(filename):
    labels = []
    for line in open(filename, "r"):
        line = line.rstrip()
        label = line.split(",")[-1]
        labels.append(label)
    labels = list(set(labels))
    return list(set(labels))

labels = make_label_idx(filename)
labels = sorted(labels, key=lambda d:int(d.replace(".","").replace(" ","")))

訓練用と予測用のデータをファイルから作成

In [3]:
for line in open(filename,"r"):
    line = line.rstrip()
    label = labels.index(line.split(",")[-1])
    features = list(map(float,line.split(",")[:-1]))
    X.append(features)
    y.append(label)

X = np.array(X)
y = np.array(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
print("X_train:{}, y_train:{}, X_test:{}, y_test:{}".format(X_train.shape, y_train.shape,
                                                            X_test.shape, y_test.shape))
lb = LabelBinarizer().fit(y)
labels_train = lb.transform(y_train)
labels_test = lb.transform(y_test)
print("labels_train:{}, labels_test:{}".format(labels_train.shape, labels_test.shape))
X_train:(4990, 617), y_train:(4990,), X_test:(1248, 617), y_test:(1248,)
labels_train:(4990, 26), labels_test:(1248, 26)

ネットワークの定義とパラメータの初期化

In [4]:
output_size = len(labels)
sequential = rm.Sequential([
    rm.Dense(100),
    rm.Relu(),
    rm.Dense(50),
    rm.Relu(),
    rm.Dense(output_size)
])

学習率の設定

まず初めに学習率の影響についてです.
この問題を以下に示します.

ソースコード中には最適化の設定は以下の部分に書かれています.

optimizer=rm.Sgd(lr=0.001, momentum=0.0)

今回は学習率が小さすぎたケースについて見たいと思います.

In [8]:
trainer = Trainer(model=sequential,
                  num_epoch=20,
                  batch_size=128,
                  shuffle=True,
                  loss_func=rm.softmax_cross_entropy,
                  optimizer=rm.Sgd(lr=0.01, momentum=0.0))

trainer.train(train_distributor=NdarrayDistributor(X_train, labels_train),
              test_distributor=NdarrayDistributor(X_test, labels_test))
epoch  0: avg loss 2.5705: avg test loss 2.4563: 39it [00:00, 71.51it/s]
epoch  1: avg loss 2.2928: avg test loss 2.1781: 39it [00:00, 127.40it/s]
epoch  2: avg loss 2.0119: avg test loss 1.9027: 39it [00:00, 189.11it/s]
epoch  3: avg loss 1.7486: avg test loss 1.6588: 39it [00:00, 176.87it/s]
epoch  4: avg loss 1.5210: avg test loss 1.4545: 39it [00:00, 189.89it/s]
epoch  5: avg loss 1.3337: avg test loss 1.2922: 39it [00:00, 161.71it/s]
epoch  6: avg loss 1.1825: avg test loss 1.1519: 39it [00:00, 185.84it/s]
epoch  7: avg loss 1.0625: avg test loss 1.0475: 39it [00:00, 194.21it/s]
epoch  8: avg loss 0.9655: avg test loss 0.9641: 39it [00:00, 185.89it/s]
epoch  9: avg loss 0.8858: avg test loss 0.8908: 39it [00:00, 186.09it/s]
epoch 10: avg loss 0.8192: avg test loss 0.8254: 39it [00:00, 188.29it/s]
epoch 11: avg loss 0.7643: avg test loss 0.7735: 39it [00:00, 181.22it/s]
epoch 12: avg loss 0.7149: avg test loss 0.7287: 39it [00:00, 180.42it/s]
epoch 13: avg loss 0.6735: avg test loss 0.6974: 39it [00:00, 186.97it/s]
epoch 14: avg loss 0.6359: avg test loss 0.6554: 39it [00:00, 188.58it/s]
epoch 15: avg loss 0.6039: avg test loss 0.6230: 39it [00:00, 177.28it/s]
epoch 16: avg loss 0.5742: avg test loss 0.5990: 39it [00:00, 133.24it/s]
epoch 17: avg loss 0.5481: avg test loss 0.5693: 39it [00:00, 158.45it/s]
epoch 18: avg loss 0.5239: avg test loss 0.5461: 39it [00:00, 161.07it/s]
epoch 19: avg loss 0.5033: avg test loss 0.5245: 39it [00:00, 163.58it/s]

ネットワークの定義とパラメータの初期化

次は学習率が大きすぎたケースについて見たいと思います.

In [6]:
output_size = len(labels)
sequential = rm.Sequential([
    rm.Dense(100),
    rm.Relu(),
    rm.Dense(50),
    rm.Relu(),
    rm.Dense(output_size)
])

trainer = Trainer(model=sequential,
                  num_epoch=20,
                  batch_size=128,
                  shuffle=True,
                  loss_func=rm.softmax_cross_entropy,
                  optimizer=rm.Sgd(lr=0.5, momentum=0.0))
trainer.train(train_distributor=NdarrayDistributor(X_train, labels_train),
              test_distributor=NdarrayDistributor(X_test, labels_test))
epoch  0: avg loss 2.4769: avg test loss 5.1215: 39it [00:00, 190.08it/s]
epoch  1: avg loss 2.3430: avg test loss 1.5031: 39it [00:00, 193.89it/s]
epoch  2: avg loss 1.1215: avg test loss 0.7088: 39it [00:00, 189.75it/s]
epoch  3: avg loss 0.6896: avg test loss 0.6993: 39it [00:00, 195.12it/s]
epoch  4: avg loss 0.6912: avg test loss 0.5989: 39it [00:00, 193.02it/s]
epoch  5: avg loss 0.6340: avg test loss 0.4674: 39it [00:00, 118.82it/s]
epoch  6: avg loss 0.4619: avg test loss 0.3128: 39it [00:00, 135.87it/s]
epoch  7: avg loss 0.3572: avg test loss 0.2458: 39it [00:00, 126.80it/s]
epoch  8: avg loss 0.2623: avg test loss 0.2436: 39it [00:00, 178.96it/s]
epoch  9: avg loss 0.2080: avg test loss 0.2934: 39it [00:00, 190.40it/s]
epoch 10: avg loss 0.2877: avg test loss 0.2199: 39it [00:00, 190.40it/s]
epoch 11: avg loss 0.1507: avg test loss 0.2148: 39it [00:00, 188.13it/s]
epoch 12: avg loss 0.1376: avg test loss 0.1707: 39it [00:00, 181.45it/s]
epoch 13: avg loss 0.1435: avg test loss 0.1547: 39it [00:00, 181.11it/s]
epoch 14: avg loss 0.1901: avg test loss 0.1513: 39it [00:00, 183.26it/s]
epoch 15: avg loss 0.1051: avg test loss 0.1819: 39it [00:00, 189.83it/s]
epoch 16: avg loss 0.0704: avg test loss 0.1878: 39it [00:00, 185.68it/s]
epoch 17: avg loss 0.1579: avg test loss 0.1611: 39it [00:00, 196.28it/s]
epoch 18: avg loss 0.1460: avg test loss 0.1463: 39it [00:00, 120.99it/s]
epoch 19: avg loss 5.1569: avg test loss 3.2725: 39it [00:00, 153.28it/s]

モメンタムの設定

次にモメンタムの設定についてです. モメンタムの影響について以下に示します.
モメンタム項の役割は以下のとおりです.
In [9]:
output_size = len(labels)
sequential = rm.Sequential([
    rm.Dense(100),
    rm.Relu(),
    rm.Dense(50),
    rm.Relu(),
    rm.Dense(output_size)
])

trainer = Trainer(model=sequential,
                  num_epoch=20,
                  batch_size=128,
                  shuffle=True,
                  loss_func=rm.softmax_cross_entropy,
                  optimizer=rm.Sgd(lr=0.01, momentum=0.4))
trainer.train(train_distributor=NdarrayDistributor(X_train, labels_train),
              test_distributor=NdarrayDistributor(X_test, labels_test))
epoch  0: avg loss 3.0649: avg test loss 2.8258: 39it [00:00, 102.20it/s]
epoch  1: avg loss 2.5867: avg test loss 2.3742: 39it [00:00, 139.06it/s]
epoch  2: avg loss 2.1485: avg test loss 1.9562: 39it [00:00, 143.92it/s]
epoch  3: avg loss 1.7408: avg test loss 1.5746: 39it [00:00, 103.48it/s]
epoch  4: avg loss 1.4175: avg test loss 1.3063: 39it [00:00, 118.26it/s]
epoch  5: avg loss 1.1841: avg test loss 1.1168: 39it [00:00, 109.36it/s]
epoch  6: avg loss 1.0179: avg test loss 0.9786: 39it [00:00, 101.90it/s]
epoch  7: avg loss 0.8955: avg test loss 0.8810: 39it [00:00, 148.43it/s]
epoch  8: avg loss 0.8024: avg test loss 0.7913: 39it [00:00, 132.78it/s]
epoch  9: avg loss 0.7292: avg test loss 0.7294: 39it [00:00, 186.64it/s]
epoch 10: avg loss 0.6703: avg test loss 0.6728: 39it [00:00, 63.74it/s]
epoch 11: avg loss 0.6203: avg test loss 0.6303: 39it [00:00, 73.07it/s]
epoch 12: avg loss 0.5799: avg test loss 0.5898: 39it [00:00, 79.56it/s]
epoch 13: avg loss 0.5427: avg test loss 0.5506: 39it [00:00, 63.39it/s]
epoch 14: avg loss 0.5101: avg test loss 0.5205: 39it [00:00, 67.52it/s]
epoch 15: avg loss 0.4833: avg test loss 0.4998: 39it [00:00, 96.84it/s]
epoch 16: avg loss 0.4586: avg test loss 0.4751: 39it [00:00, 59.55it/s]
epoch 17: avg loss 0.4363: avg test loss 0.4569: 39it [00:00, 208.80it/s]
epoch 18: avg loss 0.4170: avg test loss 0.4417: 39it [00:00, 148.35it/s]
epoch 19: avg loss 0.3993: avg test loss 0.4202: 39it [00:00, 140.92it/s]