Neural Network parameters

About neural network paramaters, weights and bias and matrix manipulation.

Problems

There are many methods for classification and regression problem in practice.
Random forest and SVM are very famous method in the world, and neural network is a kind of these method and neural network can solve the novel problem depends on network definition, such as generative model.
Neural network is composed of some elements such as units and layers and activation functions and loss functions.
Basically, it calculates the output based on input, and mainly solve the classification problems and regression problems.
Basic network structure is as bellow.
As you can see above, input units represent the input variable, we can multiply the input values and weights of neural networks.
Neural network has weight and bias, which learns from input data and label.
We are looking into the shape of these in practice.

Required Libraries

  • matplotlib 2.0.2
  • numpy 1.12.1
  • scikit-learn 0.18.2
  • glob2 0.6
In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import renom as rm
from renom.optimizer import Sgd, Adam

Learning from input data and labels(Sequential Model)

In [2]:
class Model(rm.Model):
    def __init__(self):
        self.layer1 = rm.Dense(10)
        self.layer2 = rm.Dense(3)
    def forward(self, x, epoch, batch):
        t1 = rm.relu(self.layer1(x))
        out = self.layer2(t1)
        if epoch is not None and epoch < 2 and batch < 3:
            print("epoch:{}  batch:{} weight shape:{} bias shape:{}".format(epoch, batch, self.layer1.params.w.shape, self.layer1.params.b.shape))
            print("weight:{}".format(self.layer1.params.w))
            print("bias:{}".format(self.layer1.params.b))
            print()
        return out

iris = load_iris()
data = iris.data
label = iris.target

model = Model()

X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.3)
y_train = y_train.reshape(len(X_train), -1)
y_test = y_test.reshape(len(X_test), -1)
batch_size = 8
epoch = 10
N = len(X_train)
optimizer = Sgd(lr=0.001)

for i in range(epoch):
    perm = np.random.permutation(N)
    loss = 0
    for j in range(0, N // batch_size):
        train_batch = X_train[perm[j*batch_size : (j+1)*batch_size]]
        response_batch = y_train[perm[j*batch_size : (j+1)*batch_size]]

        with model.train():
            l = rm.softmax_cross_entropy(model(train_batch, i, j), response_batch)
        grad = l.grad()
        grad.update(optimizer)
        loss += l.as_ndarray()
    train_loss = loss / (N // batch_size)

    test_loss = rm.softmax_cross_entropy(model(X_test, None, None), y_test).as_ndarray()
epoch:0  batch:0 weight shape:(4, 10) bias shape:(1, 10)
weight:[[-0.22148813 -0.06455304  0.24886461 -0.22360604 -0.2212799   0.14054264
   0.06318101  0.12782612 -0.37893802 -0.69344664]
 [ 0.13048458 -0.30900329 -0.13451999 -0.00094005  0.21173206  0.07846587
   0.15815271  0.16371909 -0.03767572 -0.8459813 ]
 [-0.55866587  0.02095983  0.25410637  0.04105124 -0.63527399  0.10226102
   0.30327985  0.01344239 -0.34383473  0.20845887]
 [ 0.00146433  0.14428516 -0.37800437 -0.66030794 -0.1535928  -0.41640413
   0.05833655 -0.28710318 -0.11259635  0.58286715]]
bias:[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]

epoch:0  batch:1 weight shape:(4, 10) bias shape:(1, 10)
weight:[[-0.22148813 -0.06455304  0.25000107 -0.22360604 -0.2212799   0.14019379
   0.0611538   0.12805027 -0.37893802 -0.69344664]
 [ 0.13048458 -0.30900329 -0.13419022 -0.00094005  0.21173206  0.0782304
   0.1572374   0.16374263 -0.03767572 -0.8459813 ]
 [-0.55866587  0.02095983  0.25530201  0.04105124 -0.63527399  0.10216144
   0.30181283  0.01376484 -0.34383473  0.20845887]
 [ 0.00146433  0.14428516 -0.37757117 -0.66030794 -0.1535928  -0.41641617
   0.05786775 -0.28697777 -0.11259635  0.58286715]]
bias:[[  0.00000000e+00   0.00000000e+00   1.79188093e-04   0.00000000e+00
    0.00000000e+00  -6.85928389e-05  -3.49776412e-04   3.19690043e-05
    0.00000000e+00   0.00000000e+00]]

epoch:0  batch:2 weight shape:(4, 10) bias shape:(1, 10)
weight:[[-0.22148813 -0.06455304  0.25033292 -0.22360604 -0.2212799   0.13935776
   0.05878816  0.12789027 -0.37893802 -0.69344664]
 [ 0.13048458 -0.30900329 -0.13454546 -0.00094005  0.21173206  0.07756997
   0.15600587  0.16343664 -0.03767572 -0.8459813 ]
 [-0.55866587  0.02095983  0.25636706  0.04105124 -0.63527399  0.10199867
   0.30032635  0.01402663 -0.34383473  0.20845887]
 [ 0.00146433  0.14428516 -0.37709758 -0.66030794 -0.1535928  -0.41641265
   0.05739774 -0.28683606 -0.11259635  0.58286715]]
bias:[[  0.00000000e+00   0.00000000e+00   1.40944816e-04   0.00000000e+00
    0.00000000e+00  -2.51324120e-04  -7.51175801e-04  -3.48089925e-05
    0.00000000e+00   0.00000000e+00]]

epoch:1  batch:0 weight shape:(4, 10) bias shape:(1, 10)
weight:[[-0.22148813 -0.06455304  0.27561915 -0.22360604 -0.2212799   0.13690807
   0.02486142  0.13276328 -0.37893802 -0.69344664]
 [ 0.13048458 -0.30900329 -0.12640031 -0.00094005  0.21173206  0.07457455
   0.13912132  0.16403335 -0.03767572 -0.8459813 ]
 [-0.55866587  0.02095983  0.28480351  0.04105124 -0.63527399  0.10428211
   0.2761648   0.02186429 -0.34383473  0.20845887]
 [ 0.00146433  0.14428516 -0.36625826 -0.66030794 -0.1535928  -0.41518098
   0.04929309 -0.28363782 -0.11259635  0.58286715]]
bias:[[ 0.          0.          0.00338294  0.          0.         -0.00102422
  -0.0063385   0.00038907  0.          0.        ]]

epoch:1  batch:1 weight shape:(4, 10) bias shape:(1, 10)
weight:[[-0.22148813 -0.06455304  0.2782279  -0.22360604 -0.2212799   0.13704985
   0.02220462  0.13325311 -0.37893802 -0.69344664]
 [ 0.13048458 -0.30900329 -0.12551248 -0.00094005  0.21173206  0.07450633
   0.13782531  0.16411592 -0.03767572 -0.8459813 ]
 [-0.55866587  0.02095983  0.28760439  0.04105124 -0.63527399  0.1047022
   0.27423552  0.02259091 -0.34383473  0.20845887]
 [ 0.00146433  0.14428516 -0.36530989 -0.66030794 -0.1535928  -0.41502473
   0.0486936  -0.28337926 -0.11259635  0.58286715]]
bias:[[ 0.          0.          0.00377256  0.          0.         -0.00102172
  -0.00679465  0.00044973  0.          0.        ]]

epoch:1  batch:2 weight shape:(4, 10) bias shape:(1, 10)
weight:[[-0.22148813 -0.06455304  0.28150427 -0.22360604 -0.2212799   0.1374025
   0.01957459  0.13400918 -0.37893802 -0.69344664]
 [ 0.13048458 -0.30900329 -0.12427821 -0.00094005  0.21173206  0.07453582
   0.13649586  0.16432714 -0.03767572 -0.8459813 ]
 [-0.55866587  0.02095983  0.29110238  0.04105124 -0.63527399  0.10535019
   0.27233651  0.02359707 -0.34383473  0.20845887]
 [ 0.00146433  0.14428516 -0.36394855 -0.66030794 -0.1535928  -0.41474748
   0.04804279 -0.28296828 -0.11259635  0.58286715]]
bias:[[ 0.          0.          0.00423871  0.          0.         -0.00099465
  -0.00724287  0.00054148  0.          0.        ]]

Weight and bias are updated as above output.
We used the iris dataset which contains 150 samples and 4 dimensions(features).
Batch size and epoch are parameters which we have to determine.
Epoch is what times we are going to iterate the SGD optimization. The result varies depends on epoch, so we have to search the best epoch.
Batch size is needed in terms of memory efficiency and process speed.
When we deal with large image data, sometimes we cannot load the all data at once, so we have to devide the input data for mini batch. For example, when the number of samples is 1600 and we set the batch size to 16, every 16 samples are learned at once. So, weight shape should be 4x10, bias shape shoule be 1x10.

By the way, it is easy to interpret the meaning of weight, but it is not clear the meaning that the bias has.

When we don’t use bias, we can’t draw the separation line according to the problems, and we can’t infer the regression line which don’t cross the origin like above figure.
Next, we are going to see the result between the cases one uses the bias, another don’t use the bias so that we are going to define the neural network by using functional model.

Regression Problem

make_regression function within the scikit-learn can generate the dataset which has data and label for regression test. So, we are going to use this function to generate the dataset and see the result.

In [3]:
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt

X, y = make_regression(n_samples=500, n_features=1, random_state=0, noise=4.0, bias=100.0)
X = - X
plt.scatter(X, y)
plt.show()
../../../_images/notebooks_basic_algorithm_neural_network_general_notebook_10_0.png

Above case is a one dimension case, which is too easy to estimate so that we are going to use 5-dimension case.

the case don’t use bias(Functional model)

In [4]:
class Model(rm.Model):
    def __init__(self, input_size, hidden_size, output_size):
        self.w1 = rm.Variable(np.random.randn(input_size, hidden_size)*0.01)
        self.w2 = rm.Variable(np.random.randn(hidden_size, output_size)*0.01)

    def forward(self, x):
        t1 = rm.dot(x, self.w1)
        t2 = rm.relu(t1)
        out = rm.dot(t2, self.w2)
        return out

data, label = make_regression(n_samples=500, n_features=5, random_state=0, noise=4.0, bias=100.0)
data = - data

model = Model(input_size=data.shape[1], hidden_size=10, output_size=1)

X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.3)
y_train = y_train.reshape(len(X_train), -1)
y_test = y_test.reshape(len(X_test), -1)
batch_size = 8
epoch = 10
N = len(X_train)
optimizer = Sgd(lr=0.001)

for i in range(epoch):
    perm = np.random.permutation(N)
    loss = 0
    for j in range(0, N // batch_size):
        train_batch = X_train[perm[j*batch_size : (j+1)*batch_size]]
        response_batch = y_train[perm[j*batch_size : (j+1)*batch_size]]

        with model.train():
            l = rm.mean_squared_error(model(train_batch), response_batch)
        grad = l.grad()
        grad.update(optimizer)
        loss += l.as_ndarray()
    train_loss = loss / (N // batch_size)

    test_loss = rm.mean_squared_error(model(X_test), y_test).as_ndarray()
    print("epoch:{:03d}, train_loss:{:.4f}, test_loss:{:.4f}".format(i, float(train_loss), float(test_loss)))
epoch:000, train_loss:9117.4320, test_loss:10460.8447
epoch:001, train_loss:4329.0102, test_loss:1750.3320
epoch:002, train_loss:1493.0594, test_loss:1058.3417
epoch:003, train_loss:1275.2536, test_loss:844.1409
epoch:004, train_loss:1120.4195, test_loss:873.4360
epoch:005, train_loss:893.6425, test_loss:749.1420
epoch:006, train_loss:659.9649, test_loss:720.2554
epoch:007, train_loss:599.5877, test_loss:631.9391
epoch:008, train_loss:547.1918, test_loss:715.0874
epoch:009, train_loss:574.4669, test_loss:630.0540

the case use bias(Functional model)

In [5]:
class Model(rm.Model):
    def __init__(self, input_size, hidden_size, output_size):
        self.w1 = rm.Variable(np.random.randn(input_size, hidden_size)*0.01)
        self.b1 = rm.Variable(np.zeros((1, hidden_size)))
        self.w2 = rm.Variable(np.random.randn(hidden_size, output_size)*0.01)
        self.b2 = rm.Variable(np.zeros((1, output_size)))

    def forward(self, x):
        t1 = rm.dot(x, self.w1) + self.b1
        t2 = rm.relu(t1)
        out = rm.dot(t2, self.w2) + self.b2
        return out

data, label = make_regression(n_samples=500, n_features=5, random_state=0, noise=4.0, bias=100.0)
data = - data

model = Model(input_size=data.shape[1], hidden_size=10, output_size=1)

X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.3)
y_train = y_train.reshape(len(X_train), -1)
y_test = y_test.reshape(len(X_test), -1)
batch_size = 8
epoch = 10
N = len(X_train)
optimizer = Sgd(lr=0.001)

for i in range(epoch):
    perm = np.random.permutation(N)
    loss = 0
    for j in range(0, N // batch_size):
        train_batch = X_train[perm[j*batch_size : (j+1)*batch_size]]
        response_batch = y_train[perm[j*batch_size : (j+1)*batch_size]]

        with model.train():
            l = rm.mean_squared_error(model(train_batch), response_batch)
        grad = l.grad()
        grad.update(optimizer)
        loss += l.as_ndarray()
    train_loss = loss / (N // batch_size)

    test_loss = rm.mean_squared_error(model(X_test), y_test).as_ndarray()
    print("epoch:{:03d}, train_loss:{:.4f}, test_loss:{:.4f}".format(i, float(train_loss), float(test_loss)))
epoch:000, train_loss:7894.1248, test_loss:1777.0327
epoch:001, train_loss:575.9983, test_loss:482.7759
epoch:002, train_loss:502.4154, test_loss:473.9365
epoch:003, train_loss:487.0127, test_loss:449.0021
epoch:004, train_loss:408.5537, test_loss:294.7806
epoch:005, train_loss:202.7415, test_loss:117.5818
epoch:006, train_loss:100.6396, test_loss:98.6491
epoch:007, train_loss:81.2815, test_loss:99.0892
epoch:008, train_loss:68.5562, test_loss:110.5399
epoch:009, train_loss:57.0712, test_loss:95.6728
As you can see above result, it is totally different between use bias case and not use bias case.
Actually, when we don’t use bias, model can’t decrease the loss so that the loss suddenly goes up sometimes and stays high loss score. Neural Network has many parameters, batch size, epoch, learning_rate, unit size, layer types and so on.
But weight and bias provide the basic role in neural network before you consider the other parameters.