建物の空調エネルギー効率の予測

全結合ニューラルネットワークを用いたエネルギー効率予測

簡単な全結合ニューラルネットワークを構築して建物のエネルギー効率を解析します。暖房/冷房負荷は空調が室内の気温を一定に保つためにどの程度のエネルギーを必要とするかを表す指標(単位はkWh)で、室内気温を保ちにくいほど負荷は大きくなります。例えば、部屋の容積が大きかったり、壁が熱を通しやすい(つまり外と熱を交換しやすい)ほど空調の負荷は大きくなります。ここでは建物の壁や窓の面積などから、暖房の負荷を回帰予測します。データはUCIで無料で公開されているデータセット( https://archive.ics.uci.edu/ml/datasets/Energy+efficiency )を用います。

必要なライブラリ

  • matplotlib 2.0.2
  • numpy 1.12.1
  • scikit-learn 0.18.2
  • pandas 0.20.3
In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

import renom as rm
from renom import Sequential
from renom import Dense, Tanh, Relu
from renom import Adam

データの読み込みと前処理

In [2]:
columns = ["RelativeCompactness", "SurfaceArea", "WallArea", "RoofArea", "OverallArea",
           "Orientation", "GlazingArea", "GlazingAreaDistribution", "HeatingLoad", "CoolingLoad"]
df = pd.read_excel("../dataset/ENB2012_data.xlsx", names=columns)
df.head()
Out[2]:
RelativeCompactness SurfaceArea WallArea RoofArea OverallArea Orientation GlazingArea GlazingAreaDistribution HeatingLoad CoolingLoad
0 0.98 514.5 294.0 110.25 7.0 2 0.0 0 15.55 21.33
1 0.98 514.5 294.0 110.25 7.0 3 0.0 0 15.55 21.33
2 0.98 514.5 294.0 110.25 7.0 4 0.0 0 15.55 21.33
3 0.98 514.5 294.0 110.25 7.0 5 0.0 0 15.55 21.33
4 0.90 563.5 318.5 122.50 7.0 2 0.0 0 20.84 28.28

データをカラムごとに標準化し、numpy配列に変換します。

In [3]:
df_s = df.copy()

for col in df.columns:
    v_std = df[col].std()
    v_mean = df[col].mean()
    df_s[col] = (df_s[col] - v_mean) / v_std
In [4]:
df_s.head()
Out[4]:
RelativeCompactness SurfaceArea WallArea RoofArea OverallArea Orientation GlazingArea GlazingAreaDistribution HeatingLoad CoolingLoad
0 2.040447 -1.784712 -0.561586 -1.469119 0.999349 -1.340767 -1.7593 -1.813393 -0.669679 -0.342443
1 2.040447 -1.784712 -0.561586 -1.469119 0.999349 -0.446922 -1.7593 -1.813393 -0.669679 -0.342443
2 2.040447 -1.784712 -0.561586 -1.469119 0.999349 0.446922 -1.7593 -1.813393 -0.669679 -0.342443
3 2.040447 -1.784712 -0.561586 -1.469119 0.999349 1.340767 -1.7593 -1.813393 -0.669679 -0.342443
4 1.284142 -1.228438 0.000000 -1.197897 0.999349 -1.340767 -1.7593 -1.813393 -0.145408 0.388113
In [5]:
X, y = np.array(df_s.iloc[:, :8]), np.array(df_s.iloc[:, 8:])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

Sequentialモデルを用いたニューラルネットワークの定義

In [6]:
model = Sequential([
    Dense(8),
    Relu(),
    Dense(8),
    Relu(),
    Dense(6),
    Relu(),
    Dense(1)
])

最適化関数の定義

In [7]:
optimizer = rm.Adam()

学習ループの構築

Forward propagationが終了したら実際の暖房負荷と予測の暖房負荷の間の平均二乗誤差(MSE)を計算します。MSEを下げるようニューラルネットワークを学習させます。

In [8]:
# parameters
EPOCH = 3000 # Number of epochs
BATCH =128 # Mini-batch size

# Learning curves
learning_curve = []
test_curve = []

# Training loop
for i in range(1, 1+EPOCH):

    N = X_train.shape[0] # Number of records in training data
    perm = np.random.permutation(N)
    train_loss = 0

    for j in range(N//BATCH):
        # Make mini-batch
        index = perm[j*BATCH:(j+1)*BATCH]
        train_batch_x = X_train[index]
        train_batch_y = y_train[index]

        # Forward propagation
        with model.train():
            z = model(train_batch_x)
            loss = rm.mean_squared_error(z, train_batch_y)

        # Backpropagation
        grad = loss.grad()

        # Update
        grad.update(optimizer)

        train_loss += loss.as_ndarray()

    # calculate mean squared error for training data
    train_loss = train_loss / (N // BATCH)
    learning_curve.append(train_loss)

    # calculate mean squared error for testidation data
    y_test_pred = model(X_test)
    test_loss = rm.mean_squared_error(y_test_pred, y_test).as_ndarray()
    test_curve.append(test_loss)

    # print training progress
    if i % 100 == 0:
        print("Epoch %d - loss: %f - test_loss: %f" % (i, train_loss, test_loss))

print('Finished!')
Epoch 100 - loss: 0.113990 - test_loss: 0.130233
Epoch 200 - loss: 0.086719 - test_loss: 0.101008
Epoch 300 - loss: 0.080777 - test_loss: 0.097986
Epoch 400 - loss: 0.074606 - test_loss: 0.092612
Epoch 500 - loss: 0.071284 - test_loss: 0.085902
Epoch 600 - loss: 0.066842 - test_loss: 0.081911
Epoch 700 - loss: 0.065506 - test_loss: 0.080158
Epoch 800 - loss: 0.062914 - test_loss: 0.077236
Epoch 900 - loss: 0.059501 - test_loss: 0.074397
Epoch 1000 - loss: 0.041378 - test_loss: 0.046624
Epoch 1100 - loss: 0.030633 - test_loss: 0.032956
Epoch 1200 - loss: 0.025489 - test_loss: 0.028105
Epoch 1300 - loss: 0.022370 - test_loss: 0.024761
Epoch 1400 - loss: 0.019271 - test_loss: 0.021640
Epoch 1500 - loss: 0.018819 - test_loss: 0.020465
Epoch 1600 - loss: 0.018496 - test_loss: 0.019925
Epoch 1700 - loss: 0.018561 - test_loss: 0.019719
Epoch 1800 - loss: 0.018252 - test_loss: 0.019707
Epoch 1900 - loss: 0.018131 - test_loss: 0.019602
Epoch 2000 - loss: 0.017597 - test_loss: 0.019875
Epoch 2100 - loss: 0.017724 - test_loss: 0.019799
Epoch 2200 - loss: 0.017274 - test_loss: 0.019823
Epoch 2300 - loss: 0.017453 - test_loss: 0.019761
Epoch 2400 - loss: 0.017629 - test_loss: 0.019617
Epoch 2500 - loss: 0.017617 - test_loss: 0.019620
Epoch 2600 - loss: 0.018240 - test_loss: 0.019802
Epoch 2700 - loss: 0.018037 - test_loss: 0.019727
Epoch 2800 - loss: 0.018023 - test_loss: 0.019688
Epoch 2900 - loss: 0.017270 - test_loss: 0.019696
Epoch 3000 - loss: 0.017458 - test_loss: 0.019748
Finished!

モデルの評価

学習させたモデルを評価してみましょう。

学習曲線のプロット

In [9]:
plt.figure(figsize=(10, 4))
plt.plot(learning_curve, label='loss')
plt.plot(test_curve, label='test_loss', alpha=0.6)
plt.title('Learning curve')
plt.xlabel("Epoch")
plt.ylabel("MSE")
plt.ylim(0, 0.2)
plt.legend()
plt.grid()
../../../_images/notebooks_regression_energyefficiency_notebook_18_0.png

実際の暖房負荷と予測した暖房負荷の比較

In [12]:
# predict test value
y_pred = model(X_test)

# convert standardized heating-load to its original unit
heating_load_true = y_test[:, 0].reshape(-1, 1) * v_std + v_mean
heating_load_pred = y_pred * v_std + v_mean

plt.figure(figsize=(8, 8))
plt.plot([5, 50], [5, 50], c='k', alpha=0.6) # diagonal line
plt.scatter(heating_load_true, heating_load_pred)
plt.xlim(5, 50)
plt.ylim(5, 50)
plt.xlabel('Actual Heating Load (kWh)', fontsize=16)
plt.ylabel('Predicted Heating Load (kWh)', fontsize=16)
plt.grid()
../../../_images/notebooks_regression_energyefficiency_notebook_20_0.png

図中の対角線は予測した暖房負荷の正確さを表しています。予測と実際の暖房負荷が近いほど、プロットした点は対角線上に乗ります。

In [11]:
print(max(abs(heating_load_pred - heating_load_true)))
[ 1.57366776]

暖房負荷を誤差 +/- 1.6kWh の範囲内で正確に予測することができました。