Adagrad Optimization

Introduction to Adagrad Optimization

Adagrad is one of the popular optimization technique, and probably SGD(Stochastic Gradient Descent) is the most famous optimization method.
Neural Network has to optimize the parameters to boost the precision.
In learning calculation, parameter update rule has learning rate to learn, learning rate is how much we are going to update proportional to the gradients.
So it is important for update rule to efficiently learn.
SGD updates the parameter using the same learning rate for every parameter, but there are cases we want to add the big change to the parameter and want to add the small change to the parameter.
For example, the parameter being already close to optimal doesn’t need a big change, but the parameter being far from optimal need a big change. In such a case, adagrad optimization is useful and helpful technique.

Required Libraries

  • numpy 1.21.1
  • scikit-learn 0.18.1
In [1]:
import numpy as np
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
import renom as rm
from renom.cuda.cuda import set_cuda_active
set_cuda_active(False)

Make label data

The reference of the dataset is below.

ISOLET Data Set, Ron Cole and Mark Fanty. Department of Computer Science and Engineering,
Oregon Graduate Institute, Beaverton, OR 97006.
In [2]:
filename = "./isolet1+2+3+4.data"
labels = []
X = []
y = []

def make_label_idx(filename):
    labels = []
    for line in open(filename, "r"):
        line = line.rstrip()
        label = line.split(",")[-1]
        labels.append(label)
    labels = list(set(labels))
    return list(set(labels))

labels = make_label_idx(filename)
labels = sorted(labels, key=lambda d:int(d.replace(".","").replace(" ","")))

Load data from the file for training and prediction

In [3]:
for line in open(filename,"r"):
    line = line.rstrip()
    label = labels.index(line.split(",")[-1])
    features = list(map(float,line.split(",")[:-1]))
    X.append(features)
    y.append(label)

X = np.array(X)
y = np.array(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
print("X_train:{}, y_train:{}, X_test:{}, y_test:{}".format(X_train.shape, y_train.shape,
                                                            X_test.shape, y_test.shape))
lb = LabelBinarizer().fit(y)
labels_train = lb.transform(y_train)
labels_test = lb.transform(y_test)
print("labels_train:{}, labels_test:{}".format(labels_train.shape, labels_test.shape))
X_train:(4990, 617), y_train:(4990,), X_test:(1248, 617), y_test:(1248,)
labels_train:(4990, 26), labels_test:(1248, 26)

Network definition and initialize parameters

In [4]:
output_size = len(labels)
sequential = rm.Sequential([
    rm.Dense(100),
    rm.Relu(),
    rm.Dense(50),
    rm.Relu(),
    rm.Dense(output_size)
])

Learning loop

In [5]:
epoch = 20
batch_size = 128
N = len(X_train)
optimizer = rm.Adagrad(lr=0.01)
for i in range(epoch):
    perm = np.random.permutation(N)
    loss = 0
    for j in range(0, N//batch_size):
        train_batch = X_train[perm[j*batch_size : (j+1)*batch_size]]
        response_batch = labels_train[perm[j*batch_size : (j+1)*batch_size]]
        with sequential.train():
            l = rm.softmax_cross_entropy(sequential(train_batch), response_batch)
        grad = l.grad()
        grad.update(optimizer)
        loss += l.as_ndarray()
    train_loss = loss / (N//batch_size)
    test_loss = rm.softmax_cross_entropy(sequential(X_test), labels_test).as_ndarray()
    print("epoch:{:03d}, train_loss:{:.4f}, test_loss:{:.4f}".format(i, float(train_loss), float(test_loss)))
epoch:000, train_loss:1.7550, test_loss:0.8006
epoch:001, train_loss:0.6106, test_loss:0.5314
epoch:002, train_loss:0.4123, test_loss:0.3825
epoch:003, train_loss:0.3136, test_loss:0.3244
epoch:004, train_loss:0.2602, test_loss:0.3071
epoch:005, train_loss:0.2278, test_loss:0.2655
epoch:006, train_loss:0.1989, test_loss:0.2614
epoch:007, train_loss:0.1815, test_loss:0.2311
epoch:008, train_loss:0.1644, test_loss:0.2115
epoch:009, train_loss:0.1493, test_loss:0.2066
epoch:010, train_loss:0.1370, test_loss:0.2012
epoch:011, train_loss:0.1279, test_loss:0.2083
epoch:012, train_loss:0.1197, test_loss:0.1837
epoch:013, train_loss:0.1111, test_loss:0.1837
epoch:014, train_loss:0.1044, test_loss:0.1827
epoch:015, train_loss:0.0984, test_loss:0.1723
epoch:016, train_loss:0.0945, test_loss:0.1710
epoch:017, train_loss:0.0885, test_loss:0.1696
epoch:018, train_loss:0.0849, test_loss:0.1619
epoch:019, train_loss:0.0810, test_loss:0.1600