Building energy efficiency prediction

Energy efficiency prediction model using fully conected neural network.

In this section, we’ll construct simple fully-connected neural networks to building energy efficiency analysis. Here we predict heating-load of from each building features, such as wall-area or glazing-area. Heating/Cooling load is defined by how much energy our air conditioners need to maintain indoor temperature (unit: kWh). The more difficult it is to keep indoor temperature, the bigger Heating/CoolingLoad become. To give an example, the size of room or building material pervious to heat (it means the building easily exchange heat with outdoor) can lead to bigger load. Please download the free data from UCI website in advance ( https://archive.ics.uci.edu/ml/datasets/Energy+efficiency ).

Required libraries

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

import renom as rm
from renom import Sequential
from renom import Dense, Tanh, Relu
from renom import Adam

Load & preprocess the data

In [2]:
columns = ["RelativeCompactness", "SurfaceArea", "WallArea", "RoofArea", "OverallArea",
           "Orientation", "GlazingArea", "GlazingAreaDistribution", "HeatingLoad", "CoolingLoad"]
df = pd.read_excel("../dataset/ENB2012_data.xlsx", names=columns)
df.head()
Out[2]:
RelativeCompactness SurfaceArea WallArea RoofArea OverallArea Orientation GlazingArea GlazingAreaDistribution HeatingLoad CoolingLoad
0 0.98 514.5 294.0 110.25 7.0 2 0.0 0 15.55 21.33
1 0.98 514.5 294.0 110.25 7.0 3 0.0 0 15.55 21.33
2 0.98 514.5 294.0 110.25 7.0 4 0.0 0 15.55 21.33
3 0.98 514.5 294.0 110.25 7.0 5 0.0 0 15.55 21.33
4 0.90 563.5 318.5 122.50 7.0 2 0.0 0 20.84 28.28

Now, we standardize the data in each column and convert it to a numpy array.

In [3]:
df_s = df.copy()

for col in df.columns:
    v_std = df[col].std()
    v_mean = df[col].mean()
    df_s[col] = (df_s[col] - v_mean) / v_std
In [4]:
df_s.head()
Out[4]:
RelativeCompactness SurfaceArea WallArea RoofArea OverallArea Orientation GlazingArea GlazingAreaDistribution HeatingLoad CoolingLoad
0 2.040447 -1.784712 -0.561586 -1.469119 0.999349 -1.340767 -1.7593 -1.813393 -0.669679 -0.342443
1 2.040447 -1.784712 -0.561586 -1.469119 0.999349 -0.446922 -1.7593 -1.813393 -0.669679 -0.342443
2 2.040447 -1.784712 -0.561586 -1.469119 0.999349 0.446922 -1.7593 -1.813393 -0.669679 -0.342443
3 2.040447 -1.784712 -0.561586 -1.469119 0.999349 1.340767 -1.7593 -1.813393 -0.669679 -0.342443
4 1.284142 -1.228438 0.000000 -1.197897 0.999349 -1.340767 -1.7593 -1.813393 -0.145408 0.388113
In [5]:
X, y = np.array(df_s.iloc[:, :8]), np.array(df_s.iloc[:, 8:])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

Definition of a nural network with sequential model

In [6]:
model = Sequential([
    Dense(8),
    Relu(),
    Dense(8),
    Relu(),
    Dense(6),
    Relu(),
    Dense(1)
])

Define an optimizer function

In [7]:
optimizer = rm.Adam()

Training loop for heating-load regression

In the training loop, we recommend to watch not only train loss but also test loss to prevent overfitting. After forward propagation, we calculate mean squared error (MSE) loss between actual and predicted heating load. Our aim is to fit the NN model by reducing the MSE loss.

In [8]:
# parameters
EPOCH = 3000 # Number of epochs
BATCH =128 # Mini-batch size

# Learning curves
learning_curve = []
test_curve = []

# Training loop
for i in range(1, 1+EPOCH):

    N = X_train.shape[0] # Number of records in training data
    perm = np.random.permutation(N)
    train_loss = 0

    for j in range(N//BATCH):
        # Make mini-batch
        index = perm[j*BATCH:(j+1)*BATCH]
        train_batch_x = X_train[index]
        train_batch_y = y_train[index]

        # Forward propagation
        with model.train():
            z = model(train_batch_x)
            loss = rm.mean_squared_error(z, train_batch_y)

        # Backpropagation
        grad = loss.grad()

        # Update
        grad.update(optimizer)

        train_loss += loss.as_ndarray()

    # calculate mean squared error for training data
    train_loss = train_loss / (N // BATCH)
    learning_curve.append(train_loss)

    # calculate mean squared error for testidation data
    y_test_pred = model(X_test)
    test_loss = rm.mean_squared_error(y_test_pred, y_test).as_ndarray()
    test_curve.append(test_loss)

    # print training progress
    if i % 100 == 0:
        print("Epoch %d - loss: %f - test_loss: %f" % (i, train_loss, test_loss))

print('Finished!')
Epoch 100 - loss: 0.113990 - test_loss: 0.130233
Epoch 200 - loss: 0.086719 - test_loss: 0.101008
Epoch 300 - loss: 0.080777 - test_loss: 0.097986
Epoch 400 - loss: 0.074606 - test_loss: 0.092612
Epoch 500 - loss: 0.071284 - test_loss: 0.085902
Epoch 600 - loss: 0.066842 - test_loss: 0.081911
Epoch 700 - loss: 0.065506 - test_loss: 0.080158
Epoch 800 - loss: 0.062914 - test_loss: 0.077236
Epoch 900 - loss: 0.059501 - test_loss: 0.074397
Epoch 1000 - loss: 0.041378 - test_loss: 0.046624
Epoch 1100 - loss: 0.030633 - test_loss: 0.032956
Epoch 1200 - loss: 0.025489 - test_loss: 0.028105
Epoch 1300 - loss: 0.022370 - test_loss: 0.024761
Epoch 1400 - loss: 0.019271 - test_loss: 0.021640
Epoch 1500 - loss: 0.018819 - test_loss: 0.020465
Epoch 1600 - loss: 0.018496 - test_loss: 0.019925
Epoch 1700 - loss: 0.018561 - test_loss: 0.019719
Epoch 1800 - loss: 0.018252 - test_loss: 0.019707
Epoch 1900 - loss: 0.018131 - test_loss: 0.019602
Epoch 2000 - loss: 0.017597 - test_loss: 0.019875
Epoch 2100 - loss: 0.017724 - test_loss: 0.019799
Epoch 2200 - loss: 0.017274 - test_loss: 0.019823
Epoch 2300 - loss: 0.017453 - test_loss: 0.019761
Epoch 2400 - loss: 0.017629 - test_loss: 0.019617
Epoch 2500 - loss: 0.017617 - test_loss: 0.019620
Epoch 2600 - loss: 0.018240 - test_loss: 0.019802
Epoch 2700 - loss: 0.018037 - test_loss: 0.019727
Epoch 2800 - loss: 0.018023 - test_loss: 0.019688
Epoch 2900 - loss: 0.017270 - test_loss: 0.019696
Epoch 3000 - loss: 0.017458 - test_loss: 0.019748
Finished!

Model evaluation

Let’s evaluate the fitted model!

Plot learning curve

In [9]:
plt.figure(figsize=(10, 4))
plt.plot(learning_curve, label='loss')
plt.plot(test_curve, label='test_loss', alpha=0.6)
plt.title('Learning curve')
plt.xlabel("Epoch")
plt.ylabel("MSE")
plt.ylim(0, 0.2)
plt.legend()
plt.grid()
../../../_images/notebooks_regression_energyefficiency_notebook_18_0.png

Compare the actual and predicted heating-load

In [12]:
# predict test value
y_pred = model(X_test)

# convert standardized heating-load to its original unit
heating_load_true = y_test[:, 0].reshape(-1, 1) * v_std + v_mean
heating_load_pred = y_pred * v_std + v_mean

plt.figure(figsize=(8, 8))
plt.plot([5, 50], [5, 50], c='k', alpha=0.6) # diagonal line
plt.scatter(heating_load_true, heating_load_pred)
plt.xlim(5, 50)
plt.ylim(5, 50)
plt.xlabel('Actual Heating Load (kWh)', fontsize=16)
plt.ylabel('Predicted Heating Load (kWh)', fontsize=16)
plt.grid()
../../../_images/notebooks_regression_energyefficiency_notebook_20_0.png

Black diagonal line indicates the accuracy of heating load prediction. When the difference between actual and predicted load is small, the scatter points are positioned near the line.

In [11]:
print(max(abs(heating_load_pred - heating_load_true)))
[ 1.57366776]

We could predict heating load accurately within the range of +/- 1.6 kWh.