# 家庭の消費電力予測 ¶

LSTMを用いた家庭の消費電力予測

LSTMには多くの使い方の種類が存在します．

データセットの詳細は以下になります．

Georges HÃ©brail (georges.hebrail '@' edf.fr), Senior Researcher, EDF R&D, Clamart, France
Alice BÃ©rard, TELECOM ParisTech Master of Engineering Internship at EDF R&D, Clamart, France

## Required Libaries ¶

• matplotlib 2.0.2
• numpy 1.12.1
• scikit-learn 0.18.2
• pandas 0.20.3
In [1]:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.metrics import mean_squared_error
import renom as rm
from renom.cuda import set_cuda_active
set_cuda_active(False)


## 訓練用と予測用のデータセット作成 ¶

In [2]:

def create_dataset(data, look_back, period, blank):
X, y = [], []
for i in range(len(data)-look_back-period-blank):
X.append(data[i : i+look_back, :])
watsum = sum(list(map(float,data[i+blank+look_back : i+look_back+blank+period][0])))
y.append(watsum)
n_features = np.array(X).shape[2]
X = np.reshape(np.array(X), [-1, look_back, n_features])
y = np.reshape(np.array(y), [-1, 1])
return X, y


## データの分割 ¶

In [3]:

def split_data(X, y, test_size=0.1):
pos = int(round(len(X) * (1-test_size)))
X_train, y_train = X[:pos], y[:pos]
X_test, y_test = X[pos:], y[pos:]
return X_train, y_train, X_test, y_test


## txtファイルからのデータ読み込みと欠損値の削除 ¶

In [4]:

filename = "household_power_consumption.txt"
print("the number of {} records:{}\n".format(filename, len(df.index)))
df = df.applymap(lambda d: np.nan if d=="?" else d)
print("missing value info:\n{}\n".format(df.isnull().sum(axis=0)))
df = df.dropna(axis=0)
print("the number of {} records after trimming:{}\n".format(filename, len(df.index)))

ds = df.values.astype("float32")

the number of household_power_consumption.txt records:2075259

missing value info:
Global_active_power      25979
Global_reactive_power    25979
Voltage                  25979
Global_intensity         25979
Sub_metering_1           25979
Sub_metering_2           25979
Sub_metering_3           25979
dtype: int64

the number of household_power_consumption.txt records after trimming:2049280



## 前処理とモデルの定義 ¶

まず，データを最小値が0で最大値が1に正規化をします．

minmaxScalerは正規化のための関数でundoScalerは元にスケールを戻すための関数です．
look_backはどれくらい過去の情報を使って予測を行うかということで，periodはどれくらいの期間の値をラベルデータとするかになります．

また，データサイズが大きく処理に時間がかかるために100000までサイズを落としています．
In [5]:

def minmaxScaler(data, maxlist, minlist):
for i in range(data.shape[-1]):
if maxlist[i] - minlist[i] == 0:
data[..., i] = 1
else:
data[..., i] = (data[..., i] - minlist[i]) / (maxlist[i] - minlist[i])
return data

def undoScaler(data, maxlist, minlist):
for i in range(data.shape[-1]):
if maxlist[i] - minlist[i] == 0:
data[..., i] = maxlist[i] * 1
else:
data[..., i] = data[..., i] * (maxlist[i] - minlist[i]) + minlist[i]
return data

look_back = 30
blank = 60
period = 30
X, y = create_dataset(ds, look_back, period, blank)
X, y = X[:100000, :, :], y[:100000, :]
maxlist_data = np.max(X.reshape(X.shape[0]*X.shape[1], X.shape[2]), axis=0)
minlist_data = np.min(X.reshape(X.shape[0]*X.shape[1], X.shape[2]), axis=0)
maxlist_label = np.max(y).reshape(-1,1)
minlist_label = np.min(y).reshape(-1,1)
plt.plot(y)
plt.title("electric power consumption for 30 minutes")
plt.show()
X = minmaxScaler(X, maxlist_data, minlist_data)
y = minmaxScaler(y, maxlist_label, minlist_label)
X_train, y_train, X_test, y_test = split_data(X, y, 0.33)
print("X_train:{},y_train:{},X_test:{},y_test:{}".format(X_train.shape, y_train.shape, X_test.shape, y_test.shape))

sequential = rm.Sequential([
rm.Lstm(20),
rm.Dense(1)
])

batch_size = 2048
epoch = 30
N = len(X_train)
T = X_train.shape[1]

X_train:(67000, 30, 7),y_train:(67000, 1),X_test:(33000, 30, 7),y_test:(33000, 1)


## 学習ループ ¶

In [6]:

learning_curve = []
test_learning_curve = []
for i in range(epoch):
train_loss = 0
test_loss = 0
for j in range(N//batch_size):
train_batch = X_train[j*batch_size : (j+1)*batch_size]
response_batch = y_train[j*batch_size : (j+1)*batch_size]
l = 0
z = 0
with sequential.train():
for t in range(T):
z = sequential(train_batch[:, t, :])
l += rm.mse(z, response_batch)
l /= T
sequential.truncate()
train_loss += l.as_ndarray()
train_loss = train_loss / (N // batch_size)
l_test = 0
z = 0
for t in range(T):
z = sequential(X_test[:, t, :])
l_test += rm.mse(z, y_test)
l_test /= T
test_loss = l_test.as_ndarray()
sequential.truncate()
print("epoch:{} train loss:{} test loss:{}".format(i, train_loss, test_loss))
learning_curve.append(train_loss)
test_learning_curve.append(test_loss)

epoch:0 train loss:0.009572499962814618 test loss:0.0067715514451265335
epoch:1 train loss:0.005639389630232472 test loss:0.005541081074625254
epoch:2 train loss:0.0051836720303981565 test loss:0.005303853657096624
epoch:3 train loss:0.005126343821757473 test loss:0.005252862349152565
epoch:4 train loss:0.0051105242455378175 test loss:0.005215560086071491
epoch:5 train loss:0.0051049498724751174 test loss:0.005193015094846487
epoch:6 train loss:0.0050972960161743686 test loss:0.005174010992050171
epoch:7 train loss:0.00508945070032496 test loss:0.005157759413123131
epoch:8 train loss:0.005082122712337878 test loss:0.005143859423696995
epoch:9 train loss:0.005075385881355032 test loss:0.005131811834871769
epoch:10 train loss:0.005069269987870939 test loss:0.005121266003698111
epoch:11 train loss:0.005063754346338101 test loss:0.005111947655677795
epoch:12 train loss:0.005058795439254027 test loss:0.0051036374643445015
epoch:13 train loss:0.005054342138464563 test loss:0.005096168722957373
epoch:14 train loss:0.005050342413596809 test loss:0.005089408252388239
epoch:15 train loss:0.005046747217420489 test loss:0.0050832550041377544
epoch:16 train loss:0.0050435115190339275 test loss:0.0050776260904967785
epoch:17 train loss:0.005040595249738544 test loss:0.005072456784546375
epoch:18 train loss:0.005037962750066072 test loss:0.0050676921382546425
epoch:19 train loss:0.005035582260461524 test loss:0.0050632888451218605
epoch:20 train loss:0.005033426670706831 test loss:0.005059210117906332
epoch:21 train loss:0.005031472093833145 test loss:0.00505542429164052
epoch:22 train loss:0.0050296964109293185 test loss:0.005051903426647186
epoch:23 train loss:0.005028080631745979 test loss:0.005048623774200678
epoch:24 train loss:0.005026608763728291 test loss:0.005045564845204353
epoch:25 train loss:0.005025265425501857 test loss:0.00504270801320672
epoch:26 train loss:0.005024037745897658 test loss:0.005040035117417574
epoch:27 train loss:0.005022913828724995 test loss:0.0050375331193208694
epoch:28 train loss:0.0050218828255310655 test loss:0.005035187117755413
epoch:29 train loss:0.0050209358960273676 test loss:0.0050329845398664474


## 予測と結果を表示 ¶

Root mean squared errorと結果の図を示します.
Root mean squared errorはどれくらい結果が間違っていたかの指標として非常に便利です．
In [7]:

for t in range(T):
test_predict = sequential(X_test[:, t, :])
sequential.truncate()
test_predict = np.array(test_predict)

y_test_raw = undoScaler(y_test.reshape(-1,1), maxlist_label, minlist_label)
test_predict_raw = undoScaler(test_predict.reshape(-1,1), maxlist_label, minlist_label)

print("Root mean squared error:{}".format(np.sqrt(mean_squared_error(y_test_raw, test_predict_raw))))

plt.figure(figsize=(8,8))
plt.title("predictions")
plt.plot(y_test_raw, label ="original")
plt.plot(test_predict_raw, label="test_predict")
plt.legend()
plt.show()

Root mean squared error:16.82746141634212