# Weight Initialization ¶

Weight initialization and randomness in deep learning

- Uniform initialization is the method which generate the initial value from Uniform distribution, max is 1, min is -1.
- Gaussian initialization is the method which generate the initial value from Gaussian distribution, mean is 0, std is 1.
- Glorot Uniform initialization is the method which generate the initial value from Uniform distribution.Each layer has different max and min value of Uniform distribution depends on the number of input units and output units.
- Glorot Normal initialization is the method which generate the initial value from Gaussian distribution.Each layer has different std value of Gaussian distribution depends on the number of input units and output units.

Generally, it is said that we should initialize the weights according to the number of input units and output units, so we'll show the how-to-initialize the weights problem and typical initialization methods in the bellow figure.

## Required Libraries ¶

- numpy 1.21.1
- pandas 0.20.3
- matplotlib 2.0.2
- scikit-learn 0.18.1
- ReNom 2.5.2

```
In [1]:
```

```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
import renom as rm
from renom.utility.initializer import Uniform, Gaussian, GlorotUniform, GlorotNormal
from renom.cuda.cuda import set_cuda_active
set_cuda_active(False)
```

## Make label data ¶

In this section, we'll construct simple fully-connected neural networks to building energy efficiency analysis. Here we predict heating-load of from each building features, such as wall-area or glazing-area. Heating/Cooling load is defined by how much energy our air conditioners need to maintain indoor temperature (unit: kWh). The more difficult it is to keep indoor temperature, the bigger Heating/CoolingLoad become. To give an example, the size of room or building material pervious to heat (it means the building easily exchange heat with outdoor) can lead to bigger load. Please download the free data from UCI website in advance ( https://archive.ics.uci.edu/ml/datasets/Energy+efficiency ).

```
In [2]:
```

```
columns = ["RelativeCompactness", "SurfaceArea", "WallArea", "RoofArea", "OverallArea",
"Orientation", "GlazingArea", "GlazingAreaDistribution", "HeatingLoad", "CoolingLoad"]
df = pd.read_excel("./ENB2012_data.xlsx", names=columns)
df.head()
df_s = df.copy()
for col in df.columns:
v_std = df[col].std()
v_mean = df[col].mean()
df_s[col] = (df_s[col] - v_mean) / v_std
df_s.head()
```

```
Out[2]:
```

RelativeCompactness | SurfaceArea | WallArea | RoofArea | OverallArea | Orientation | GlazingArea | GlazingAreaDistribution | HeatingLoad | CoolingLoad | |
---|---|---|---|---|---|---|---|---|---|---|

0 | 2.040447 | -1.784712 | -0.561586 | -1.469119 | 0.999349 | -1.340767 | -1.7593 | -1.813393 | -0.669679 | -0.342443 |

1 | 2.040447 | -1.784712 | -0.561586 | -1.469119 | 0.999349 | -0.446922 | -1.7593 | -1.813393 | -0.669679 | -0.342443 |

2 | 2.040447 | -1.784712 | -0.561586 | -1.469119 | 0.999349 | 0.446922 | -1.7593 | -1.813393 | -0.669679 | -0.342443 |

3 | 2.040447 | -1.784712 | -0.561586 | -1.469119 | 0.999349 | 1.340767 | -1.7593 | -1.813393 | -0.669679 | -0.342443 |

4 | 1.284142 | -1.228438 | 0.000000 | -1.197897 | 0.999349 | -1.340767 | -1.7593 | -1.813393 | -0.145408 | 0.388113 |

```
In [3]:
```

```
X, y = np.array(df_s.iloc[:, :8]), np.array(df_s.iloc[:, 8:])
X_train, X_test, labels_train, labels_test = train_test_split(X, y, test_size=0.1, random_state=42)
```

## Training Loop ¶

```
In [4]:
```

```
def train_loop(epoch, N, batch_size, sequential, X_train, labels_train, X_test, labels_test, optimizer):
learning_curve = []
test_learning_curve = []
for i in range(epoch):
perm = np.random.permutation(N)
loss = 0
for j in range(0, N//batch_size):
train_batch = X_train[perm[j*batch_size : (j+1)*batch_size]]
response_batch = labels_train[perm[j*batch_size : (j+1)*batch_size]]
with sequential.train():
l = rm.mse(sequential(train_batch), response_batch)
grad = l.grad()
grad.update(optimizer)
loss += l.as_ndarray()
train_loss = loss / (N//batch_size)
test_loss = rm.mse(sequential(X_test), labels_test).as_ndarray()
test_learning_curve.append(test_loss)
learning_curve.append(train_loss)
return learning_curve, test_learning_curve
```

## Network definition and initialize parameters ¶

```
In [5]:
```

```
output_size = 1
epoch = 500
batch_size = 128
N = len(X_train)
optimizer = rm.Adam()
network1 = rm.Sequential([
rm.Dense(8, initializer=Uniform()),
rm.Relu(),
rm.Dense(8, initializer=Uniform()),
rm.Relu(),
rm.Dense(6, initializer=Uniform()),
rm.Relu(),
rm.Dense(1, initializer=Uniform())
])
network2 = rm.Sequential([
rm.Dense(8, initializer=Gaussian()),
rm.Relu(),
rm.Dense(8, initializer=Gaussian()),
rm.Relu(),
rm.Dense(6, initializer=Gaussian()),
rm.Relu(),
rm.Dense(1, initializer=Gaussian())
])
network3 = rm.Sequential([
rm.Dense(8, initializer=GlorotUniform()),
rm.Relu(),
rm.Dense(8, initializer=GlorotUniform()),
rm.Relu(),
rm.Dense(6, initializer=GlorotUniform()),
rm.Relu(),
rm.Dense(1, initializer=GlorotUniform())
])
network4 = rm.Sequential([
rm.Dense(8, initializer=GlorotNormal()),
rm.Relu(),
rm.Dense(8, initializer=GlorotNormal()),
rm.Relu(),
rm.Dense(6, initializer=GlorotNormal()),
rm.Relu(),
rm.Dense(1, initializer=GlorotNormal())
])
learning_curve, test_learning_curve_Gaussian = train_loop(epoch=epoch, N=N, batch_size=batch_size, sequential=network1, X_train=X_train, labels_train=labels_train, X_test=X_test, labels_test=labels_test, optimizer=optimizer)
learning_curve, test_learning_curve_Uniform = train_loop(epoch=epoch, N=N, batch_size=batch_size, sequential=network2, X_train=X_train, labels_train=labels_train, X_test=X_test, labels_test=labels_test, optimizer=optimizer)
learning_curve, test_learning_curve_GlorotUniform = train_loop(epoch=epoch, N=N, batch_size=batch_size, sequential=network3, X_train=X_train, labels_train=labels_train, X_test=X_test, labels_test=labels_test, optimizer=optimizer)
learning_curve, test_learning_curve_GlorotNormal = train_loop(epoch=epoch, N=N, batch_size=batch_size, sequential=network4, X_train=X_train, labels_train=labels_train, X_test=X_test, labels_test=labels_test, optimizer=optimizer)
plt.clf()
plt.plot(test_learning_curve_Gaussian, linewidth=1, label="Gaussian")
plt.plot(test_learning_curve_Uniform, linewidth=1, label="Uniform")
plt.plot(test_learning_curve_GlorotUniform, linewidth=1, label="GlorotUniform")
plt.plot(test_learning_curve_GlorotNormal, linewidth=1, label="GlorotNormal")
plt.title("learning_curve")
plt.ylabel("error")
plt.xlabel("epoch")
plt.ylim(0,0.5)
plt.legend()
plt.grid()
plt.show()
```