Room Occupancy Detection

Room occupency detection using Occupancy Detection Data Set

The dataset is composed of 7 attributes( datetime(every minute), temperature(Celsius), relative humidity(%), light(Lux), CO2(ppm), humidity ratio(kgwater-vapor/kg-air), ground-truth occupancy(0 or 1)). In this tutorial, construct the neural network model and detect room occupancy using temperature, relative humidity, light, CO2 and humidity ratio.
Occupancy detection can apply to a smart building and detecting occupancy certainly lead to energy-saving in the control system of building [1].

The reference of the dataset is below.

Accurate occupancy detection of an office room from light, temperature, relative humidity and CO2 measurements using statistical learning models. Luis M. Candanedo, Véronique Feldheim. Energy and Buildings. Volume 112, 15 January 2016, Pages 28-39.

Required Libaries

  • matplotlib 2.0.2
  • numpy 1.13.1
  • scikit-learn 0.18.2
In [1]:
from __future__ import division, print_function
import datetime
import numpy as np
import pandas as pd

import renom as rm
from renom.optimizer import Adam

from sklearn.metrics import confusion_matrix, classification_report, roc_curve, accuracy_score
from sklearn.preprocessing import StandardScaler, MinMaxScaler

import matplotlib.pyplot as plt

from renom.cuda import set_cuda_active

# If you would like to use GPU, set True, otherwise you should be set to False.
set_cuda_active(False)

Load Data

Download dataset form https://github.com/LuisM78/Occupancy-detection-data and put it into work directory.
Load 'datatraining.txt' as training data, 'dataset2.txt' as test data.
In [2]:
df_train = pd.read_csv('./datatraining.txt')
df_test = pd.read_csv('./datatest2.txt')

Preprocessing Data for Visualization

Preprocess data for visualization.
In order to show every sensor data in a same graph, normalize data from 0 to 1.
Transform string date and time data into datetime type. In addition, survey the range that status is 'Occupied'.
In [3]:
temperature = np.array(df_train['Temperature'])
humidity = np.array(df_train['Humidity'])
light = np.array(df_train['Light'])
CO2 = np.array(df_train['CO2'])
humidity_ratio = np.array(df_train['HumidityRatio'])
occupancy = np.array(df_train['Occupancy'])

mms = MinMaxScaler()

# Normalize data.
temperature_norm = mms.fit_transform(temperature.reshape(-1, 1))
humidity_norm = mms.fit_transform(humidity.reshape(-1, 1))
light_norm = mms.fit_transform(light.reshape(-1, 1))
CO2_norm = mms.fit_transform(CO2.reshape(-1, 1))
humidity_ratio_norm =mms.fit_transform(humidity_ratio.reshape(-1, 1))

# Survey the range that status is 'Occupied'.
date = []
occupied_rage = []
isOccupied = False
occupied_start = []
occupied_end = []

for i, d in enumerate(df_train['date']):
    dt = datetime.datetime.strptime(d, "%Y-%m-%d %H:%M:%S")
    date.append(dt)
    if isOccupied == False and occupancy[i] == 1:
        occupied_start.append(dt)
        isOccupied = True
    elif isOccupied == True and occupancy[i] == 0:
        occupied_end.append(date[i-1])
        isOccupied = False

if isOccupied == True:
    occupied_end.append(date[-1])
    isOccupied = False

Data Visualization

Show the graph that x-axis is datetime, y-axis is normalized value with regard each parameter.
Gray color in background means that the status of the range is 'Occupied'.
This graph shows that every parameter increases when status is 'Occupied'.
In some ranges, relative humidity, temperature and so on increases although status is not 'Occupied'.
In [4]:
plt.figure(figsize=(20, 10))
for s, e in zip(occupied_start, occupied_end):
    plt.axvspan(s, e, facecolor='black', alpha=0.2)
plt.plot(date, temperature_norm, label='Temperature')
plt.plot(date, humidity_norm, label='Relative Humidity')
plt.plot(date, light_norm, label='Light')
plt.plot(date, CO2_norm, label='CO2')
plt.plot(date, humidity_ratio_norm, label='Humidity Ratio')
plt.xlim(min(date), max(date))
plt.xlabel('Date')
plt.legend()
plt.show()
../../../_images/notebooks_clustering_occupancy-detection_notebook_9_0.png

Preprocessing Data for Training

Transform data except for 'date' into ndarray, and standardize it.

In [5]:
x_train, y_train = np.array(df_train.iloc[:, 1:6]), np.array(df_train.iloc[:, 6:])
x_test, y_test = np.array(df_test.iloc[:, 1:6]), np.array(df_test.iloc[:, 6:])

sc = StandardScaler()

sc.fit(x_train)
x_train_std = sc.transform(x_train)
x_test_std = sc.transform(x_test)

Model Definition

In order to put output value in 0~1, in last layre insert sigmoid function as activation function.
Decide a threshold, map output value into 0 or 1. By this mapping, decide which staus is 'Occuipied' or not.
In [6]:
model = rm.Sequential([
    rm.Dense(6),
    rm.Relu(),
    rm.Dense(1),
    rm.Sigmoid()
])

Define optimizer

Choose adam as an optimization algorithm, set learning rate to 0.001.

In [7]:
optimizer = Adam(lr=0.001)

Training loop

Make random index using the function numpy.random.permutation, and construct batch data. In this tutorial, set the batch size to 128, epoch to 100. Use mean squared error as error function.

In [8]:
# parameters
EPOCH = 100 # Number of epochs
BATCH =128 # Mini-batch size

# Learning curves
learning_curve = []
test_curve = []

# Training loop
for i in range(1, 1+EPOCH):

    N = x_train_std.shape[0] # Number of records in training data
    perm = np.random.permutation(N)
    train_loss = 0

    for j in range(N//BATCH):
        # Make mini-batch
        index = perm[j*BATCH:(j+1)*BATCH]
        train_batch_x = x_train_std[index]
        train_batch_y = y_train[index]

        # Forward propagation
        with model.train():
            z = model(train_batch_x)
            loss = rm.mean_squared_error(z, train_batch_y)

        # Backpropagation
        grad = loss.grad()

        # Update
        grad.update(optimizer)

        train_loss += loss.as_ndarray()

    # calculate mean squared error for training data
    train_loss = train_loss / (N // BATCH)
    learning_curve.append(train_loss)

    # calculate mean squared error for testidation data
    y_test_pred = model(x_test_std)
    test_loss = rm.mean_squared_error(y_test_pred, y_test).as_ndarray()
    test_curve.append(test_loss)

    # print training progress
    if i % 10 == 0:
        print("Epoch %d - loss: %f - test_loss: %f" % (i, train_loss, test_loss))

print('Finished!')
Epoch 10 - loss: 0.011556 - test_loss: 0.029600
Epoch 20 - loss: 0.007123 - test_loss: 0.020418
Epoch 30 - loss: 0.006330 - test_loss: 0.018179
Epoch 40 - loss: 0.006049 - test_loss: 0.016938
Epoch 50 - loss: 0.005893 - test_loss: 0.015705
Epoch 60 - loss: 0.005739 - test_loss: 0.014743
Epoch 70 - loss: 0.005738 - test_loss: 0.013876
Epoch 80 - loss: 0.005751 - test_loss: 0.013107
Epoch 90 - loss: 0.005648 - test_loss: 0.012373
Epoch 100 - loss: 0.005553 - test_loss: 0.011171
Finished!

Learning Curve

In [9]:
plt.figure(figsize=(10, 4))
plt.plot(learning_curve, label='loss')
plt.plot(test_curve, label='test_loss', alpha=0.6)
plt.title('Learning curve')
plt.xlabel("Epoch")
plt.ylabel("MSE")
plt.legend()
plt.grid()
../../../_images/notebooks_clustering_occupancy-detection_notebook_19_0.png

Evaluation

Map output value which is greater than or equal a threshold to 1, othrewise 0. In this tutorial, set the threshold to 0.5. Calculate accuracy, confusion matrix, precision, recall, f1-value using scikit-learn.
Confusion matrix is shown as follows:
[TN FP] [FN TP]
In [10]:
y_pred = model(x_test_std)
y_pred = np.asarray(y_pred)

threshold = 0.5
y_pred_binary = np.add(y_pred, threshold)
y_pred_binary = np.floor(y_pred_binary)
y_pred_binary = y_pred_binary.astype(np.int64)

print("accuracy : {}".format(accuracy_score(y_test, y_pred_binary)))
print(confusion_matrix(y_test, y_pred_binary))
print(classification_report(y_test, y_pred_binary))
accuracy : 0.9707752255947498
[[7426  277]
 [   8 2041]]
             precision    recall  f1-score   support

          0       1.00      0.96      0.98      7703
          1       0.88      1.00      0.93      2049

avg / total       0.97      0.97      0.97      9752

ROC Curve

ROC(receiver operating characteristics) curve is one of the way to examine the performance of classifier.
It is drawn by plotting TP and FP while changing the threshold.
(0,0) is the point in the case that classifier predict only 0,
(1,1) is the point in the case that classifier predict only 1.
By looking at this graph, We can decide the threshold while considering the tradeoff about TP and FP.
In [11]:
fpr, tpr, thresholds = roc_curve(y_test, y_pred)

plt.figure(figsize=(10, 4))
plt.plot(fpr, tpr)
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.grid()
../../../_images/notebooks_clustering_occupancy-detection_notebook_23_0.png

References

[1]
Kemal Tutuncu, Ozcan Cataltas, Murat Koklu, OCCUPANCY DETECTION THROUGH LIGHT, TEMPERATURE, HUMIDITY AND CO2 SENSORS USING ANN, Proceedings of ISER 45th International Conference, 2016