家庭の消費電力予測

LSTMを用いた家庭の消費電力予測

LSTMには多くの使い方の種類が存在します.
以下に使い方のうち3つの例を示します.しかし,他にも多くの使い方が存在します.
今回はそのうちのmany to oneの場合を用います.
データセットの詳細は以下になります.
家庭の消費電力予測
Georges Hébrail (georges.hebrail '@' edf.fr), Senior Researcher, EDF R&D, Clamart, France
Alice Bérard, TELECOM ParisTech Master of Engineering Internship at EDF R&D, Clamart, France

Required Libaries

  • matplotlib 2.0.2
  • numpy 1.12.1
  • scikit-learn 0.18.2
  • pandas 0.20.3
In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.metrics import mean_squared_error
import renom as rm
from renom.optimizer import Adam
from renom.cuda import set_cuda_active
set_cuda_active(False)

訓練用と予測用のデータセット作成

我々は30分間のデータを用いて60分後から30分間の消費電力の予測を行います.

In [2]:
def create_dataset(data, look_back, period, blank):
    X, y = [], []
    for i in range(len(data)-look_back-period-blank):
        X.append(data[i : i+look_back, :])
        watsum = sum(list(map(float,data[i+blank+look_back : i+look_back+blank+period][0])))
        y.append(watsum)
    n_features = np.array(X).shape[2]
    X = np.reshape(np.array(X), [-1, look_back, n_features])
    y = np.reshape(np.array(y), [-1, 1])
    return X, y

データの分割

In [3]:
def split_data(X, y, test_size=0.1):
    pos = int(round(len(X) * (1-test_size)))
    X_train, y_train = X[:pos], y[:pos]
    X_test, y_test = X[pos:], y[pos:]
    return X_train, y_train, X_test, y_test

txtファイルからのデータ読み込みと欠損値の削除

In [4]:
filename = "household_power_consumption.txt"
df = pd.read_csv(filename,sep=";", usecols=[2,3,4,5,6,7,8], low_memory=False)
print("the number of {} records:{}\n".format(filename, len(df.index)))
df = df.applymap(lambda d: np.nan if d=="?" else d)
print("missing value info:\n{}\n".format(df.isnull().sum(axis=0)))
df = df.dropna(axis=0)
print("the number of {} records after trimming:{}\n".format(filename, len(df.index)))

ds = df.values.astype("float32")
the number of household_power_consumption.txt records:2075259

missing value info:
Global_active_power      25979
Global_reactive_power    25979
Voltage                  25979
Global_intensity         25979
Sub_metering_1           25979
Sub_metering_2           25979
Sub_metering_3           25979
dtype: int64

the number of household_power_consumption.txt records after trimming:2049280

前処理とモデルの定義

まず,データを最小値が0で最大値が1に正規化をします.
通常はscikit-learnのminmaxscalerを使うことが多いかもしれませんが,もう一度元の数値に戻す必要があるので,自身で定義を行いました.
minmaxScalerは正規化のための関数でundoScalerは元にスケールを戻すための関数です.
look_backはどれくらい過去の情報を使って予測を行うかということで,periodはどれくらいの期間の値をラベルデータとするかになります.
我々は30分間のデータを用いて60分後から30分間の消費電力の予測を行います.
今回は30分のデータを用いて30分間の消費電力を予測するので,どちらも30と設定しています.
また,データサイズが大きく処理に時間がかかるために100000までサイズを落としています.
In [5]:
def minmaxScaler(data, maxlist, minlist):
    for i in range(data.shape[-1]):
        if maxlist[i] - minlist[i] == 0:
            data[..., i] = 1
        else:
            data[..., i] = (data[..., i] - minlist[i]) / (maxlist[i] - minlist[i])
    return data

def undoScaler(data, maxlist, minlist):
    for i in range(data.shape[-1]):
        if maxlist[i] - minlist[i] == 0:
            data[..., i] = maxlist[i] * 1
        else:
            data[..., i] = data[..., i] * (maxlist[i] - minlist[i]) + minlist[i]
    return data

look_back = 30
blank = 60
period = 30
X, y = create_dataset(ds, look_back, period, blank)
X, y = X[:100000, :, :], y[:100000, :]
maxlist_data = np.max(X.reshape(X.shape[0]*X.shape[1], X.shape[2]), axis=0)
minlist_data = np.min(X.reshape(X.shape[0]*X.shape[1], X.shape[2]), axis=0)
maxlist_label = np.max(y).reshape(-1,1)
minlist_label = np.min(y).reshape(-1,1)
plt.plot(y)
plt.title("electric power consumption for 30 minutes")
plt.show()
X = minmaxScaler(X, maxlist_data, minlist_data)
y = minmaxScaler(y, maxlist_label, minlist_label)
X_train, y_train, X_test, y_test = split_data(X, y, 0.33)
print("X_train:{},y_train:{},X_test:{},y_test:{}".format(X_train.shape, y_train.shape, X_test.shape, y_test.shape))

sequential = rm.Sequential([
    rm.Lstm(20),
    rm.Dense(1)
])

batch_size = 2048
epoch = 30
N = len(X_train)
T = X_train.shape[1]
../../../_images/notebooks_time_series_household-electric-power-consumption_notebook_10_0.png
X_train:(67000, 30, 7),y_train:(67000, 1),X_test:(33000, 30, 7),y_test:(33000, 1)

学習ループ

In [6]:
learning_curve = []
test_learning_curve = []
optimizer = Adam(lr=0.01)
for i in range(epoch):
    train_loss = 0
    test_loss = 0
    for j in range(N//batch_size):
        train_batch = X_train[j*batch_size : (j+1)*batch_size]
        response_batch = y_train[j*batch_size : (j+1)*batch_size]
        l = 0
        z = 0
        with sequential.train():
            for t in range(T):
                z = sequential(train_batch[:, t, :])
                l += rm.mse(z, response_batch)
            l /= T
            sequential.truncate()
        l.grad().update(optimizer)
        train_loss += l.as_ndarray()
    train_loss = train_loss / (N // batch_size)
    l_test = 0
    z = 0
    for t in range(T):
        z = sequential(X_test[:, t, :])
        l_test += rm.mse(z, y_test)
    l_test /= T
    test_loss = l_test.as_ndarray()
    sequential.truncate()
    print("epoch:{} train loss:{} test loss:{}".format(i, train_loss, test_loss))
    learning_curve.append(train_loss)
    test_learning_curve.append(test_loss)
epoch:0 train loss:0.009572499962814618 test loss:0.0067715514451265335
epoch:1 train loss:0.005639389630232472 test loss:0.005541081074625254
epoch:2 train loss:0.0051836720303981565 test loss:0.005303853657096624
epoch:3 train loss:0.005126343821757473 test loss:0.005252862349152565
epoch:4 train loss:0.0051105242455378175 test loss:0.005215560086071491
epoch:5 train loss:0.0051049498724751174 test loss:0.005193015094846487
epoch:6 train loss:0.0050972960161743686 test loss:0.005174010992050171
epoch:7 train loss:0.00508945070032496 test loss:0.005157759413123131
epoch:8 train loss:0.005082122712337878 test loss:0.005143859423696995
epoch:9 train loss:0.005075385881355032 test loss:0.005131811834871769
epoch:10 train loss:0.005069269987870939 test loss:0.005121266003698111
epoch:11 train loss:0.005063754346338101 test loss:0.005111947655677795
epoch:12 train loss:0.005058795439254027 test loss:0.0051036374643445015
epoch:13 train loss:0.005054342138464563 test loss:0.005096168722957373
epoch:14 train loss:0.005050342413596809 test loss:0.005089408252388239
epoch:15 train loss:0.005046747217420489 test loss:0.0050832550041377544
epoch:16 train loss:0.0050435115190339275 test loss:0.0050776260904967785
epoch:17 train loss:0.005040595249738544 test loss:0.005072456784546375
epoch:18 train loss:0.005037962750066072 test loss:0.0050676921382546425
epoch:19 train loss:0.005035582260461524 test loss:0.0050632888451218605
epoch:20 train loss:0.005033426670706831 test loss:0.005059210117906332
epoch:21 train loss:0.005031472093833145 test loss:0.00505542429164052
epoch:22 train loss:0.0050296964109293185 test loss:0.005051903426647186
epoch:23 train loss:0.005028080631745979 test loss:0.005048623774200678
epoch:24 train loss:0.005026608763728291 test loss:0.005045564845204353
epoch:25 train loss:0.005025265425501857 test loss:0.00504270801320672
epoch:26 train loss:0.005024037745897658 test loss:0.005040035117417574
epoch:27 train loss:0.005022913828724995 test loss:0.0050375331193208694
epoch:28 train loss:0.0050218828255310655 test loss:0.005035187117755413
epoch:29 train loss:0.0050209358960273676 test loss:0.0050329845398664474

予測と結果を表示

Root mean squared errorと結果の図を示します.
Root mean squared errorはどれくらい結果が間違っていたかの指標として非常に便利です.
In [7]:
for t in range(T):
    test_predict = sequential(X_test[:, t, :])
sequential.truncate()
test_predict = np.array(test_predict)

y_test_raw = undoScaler(y_test.reshape(-1,1), maxlist_label, minlist_label)
test_predict_raw = undoScaler(test_predict.reshape(-1,1), maxlist_label, minlist_label)

print("Root mean squared error:{}".format(np.sqrt(mean_squared_error(y_test_raw, test_predict_raw))))

plt.figure(figsize=(8,8))
plt.title("predictions")
plt.plot(y_test_raw, label ="original")
plt.plot(test_predict_raw, label="test_predict")
plt.legend()
plt.show()
Root mean squared error:16.82746141634212
../../../_images/notebooks_time_series_household-electric-power-consumption_notebook_14_1.png