# Room Occupancy Detection ¶

Room occupency detection using Occupancy Detection Data Set

The dataset is composed of 7 attributes( datetime(every minute), temperature(Celsius), relative humidity(%), light(Lux), CO2(ppm), humidity ratio(kgwater-vapor/kg-air), ground-truth occupancy(0 or 1)). In this tutorial, construct the neural network model and detect room occupancy using temperature, relative humidity, light, CO2 and humidity ratio.
Occupancy detection can apply to a smart building and detecting occupancy certainly lead to energy-saving in the control system of building [1].

The reference of the dataset is below.

Accurate occupancy detection of an office room from light, temperature, relative humidity and CO2 measurements using statistical learning models. Luis M. Candanedo, VÃ©ronique Feldheim. Energy and Buildings. Volume 112, 15 January 2016, Pages 28-39.

## Required Libaries ¶

• matplotlib 2.0.2
• numpy 1.13.1
• scikit-learn 0.18.2
In [1]:
from __future__ import division, print_function
import datetime
import numpy as np
import pandas as pd

import renom as rm

from sklearn.metrics import confusion_matrix, classification_report, roc_curve, accuracy_score
from sklearn.preprocessing import StandardScaler, MinMaxScaler

import matplotlib.pyplot as plt

from renom.cuda import set_cuda_active

# If you would like to use GPU, set True, otherwise you should be set to False.
set_cuda_active(False)

Load 'datatraining.txt' as training data, 'dataset2.txt' as test data.
In [2]:

### Preprocessing Data for Visualization ¶

Preprocess data for visualization.
In order to show every sensor data in a same graph, normalize data from 0 to 1.
Transform string date and time data into datetime type. In addition, survey the range that status is 'Occupied'.
In [3]:
temperature = np.array(df_train['Temperature'])
humidity = np.array(df_train['Humidity'])
light = np.array(df_train['Light'])
CO2 = np.array(df_train['CO2'])
humidity_ratio = np.array(df_train['HumidityRatio'])
occupancy = np.array(df_train['Occupancy'])

mms = MinMaxScaler()

# Normalize data.
temperature_norm = mms.fit_transform(temperature.reshape(-1, 1))
humidity_norm = mms.fit_transform(humidity.reshape(-1, 1))
light_norm = mms.fit_transform(light.reshape(-1, 1))
CO2_norm = mms.fit_transform(CO2.reshape(-1, 1))
humidity_ratio_norm =mms.fit_transform(humidity_ratio.reshape(-1, 1))

# Survey the range that status is 'Occupied'.
date = []
occupied_rage = []
isOccupied = False
occupied_start = []
occupied_end = []

for i, d in enumerate(df_train['date']):
dt = datetime.datetime.strptime(d, "%Y-%m-%d %H:%M:%S")
date.append(dt)
if isOccupied == False and occupancy[i] == 1:
occupied_start.append(dt)
isOccupied = True
elif isOccupied == True and occupancy[i] == 0:
occupied_end.append(date[i-1])
isOccupied = False

if isOccupied == True:
occupied_end.append(date[-1])
isOccupied = False

### Data Visualization ¶

Show the graph that x-axis is datetime, y-axis is normalized value with regard each parameter.
Gray color in background means that the status of the range is 'Occupied'.
This graph shows that every parameter increases when status is 'Occupied'.
In some ranges, relative humidity, temperature and so on increases although status is not 'Occupied'.
In [4]:
plt.figure(figsize=(20, 10))
for s, e in zip(occupied_start, occupied_end):
plt.axvspan(s, e, facecolor='black', alpha=0.2)
plt.plot(date, temperature_norm, label='Temperature')
plt.plot(date, humidity_norm, label='Relative Humidity')
plt.plot(date, light_norm, label='Light')
plt.plot(date, CO2_norm, label='CO2')
plt.plot(date, humidity_ratio_norm, label='Humidity Ratio')
plt.xlim(min(date), max(date))
plt.xlabel('Date')
plt.legend()
plt.show()

### Preprocessing Data for Training ¶

Transform data except for 'date' into ndarray, and standardize it.

In [5]:
x_train, y_train = np.array(df_train.iloc[:, 1:6]), np.array(df_train.iloc[:, 6:])
x_test, y_test = np.array(df_test.iloc[:, 1:6]), np.array(df_test.iloc[:, 6:])

sc = StandardScaler()

sc.fit(x_train)
x_train_std = sc.transform(x_train)
x_test_std = sc.transform(x_test)

## Model Definition ¶

In order to put output value in 0~1, in last layre insert sigmoid function as activation function.
Decide a threshold, map output value into 0 or 1. By this mapping, decide which staus is 'Occuipied' or not.
In [6]:
model = rm.Sequential([
rm.Dense(6),
rm.Relu(),
rm.Dense(1),
rm.Sigmoid()
])

## Define optimizer ¶

Choose adam as an optimization algorithm, set learning rate to 0.001.

In [7]:

## Training loop ¶

Make random index using the function numpy.random.permutation, and construct batch data. In this tutorial, set the batch size to 128, epoch to 100. Use mean squared error as error function.

In [8]:
# parameters
EPOCH = 100 # Number of epochs
BATCH =128 # Mini-batch size

# Learning curves
learning_curve = []
test_curve = []

# Training loop
for i in range(1, 1+EPOCH):

N = x_train_std.shape[0] # Number of records in training data
perm = np.random.permutation(N)
train_loss = 0

for j in range(N//BATCH):
# Make mini-batch
index = perm[j*BATCH:(j+1)*BATCH]
train_batch_x = x_train_std[index]
train_batch_y = y_train[index]

# Forward propagation
with model.train():
z = model(train_batch_x)
loss = rm.mean_squared_error(z, train_batch_y)

# Backpropagation

# Update

train_loss += loss.as_ndarray()

# calculate mean squared error for training data
train_loss = train_loss / (N // BATCH)
learning_curve.append(train_loss)

# calculate mean squared error for testidation data
y_test_pred = model(x_test_std)
test_loss = rm.mean_squared_error(y_test_pred, y_test).as_ndarray()
test_curve.append(test_loss)

# print training progress
if i % 10 == 0:
print("Epoch %d - loss: %f - test_loss: %f" % (i, train_loss, test_loss))

print('Finished!')
Epoch 10 - loss: 0.011556 - test_loss: 0.029600
Epoch 20 - loss: 0.007123 - test_loss: 0.020418
Epoch 30 - loss: 0.006330 - test_loss: 0.018179
Epoch 40 - loss: 0.006049 - test_loss: 0.016938
Epoch 50 - loss: 0.005893 - test_loss: 0.015705
Epoch 60 - loss: 0.005739 - test_loss: 0.014743
Epoch 70 - loss: 0.005738 - test_loss: 0.013876
Epoch 80 - loss: 0.005751 - test_loss: 0.013107
Epoch 90 - loss: 0.005648 - test_loss: 0.012373
Epoch 100 - loss: 0.005553 - test_loss: 0.011171
Finished!

### Learning Curve ¶

In [9]:
plt.figure(figsize=(10, 4))
plt.plot(learning_curve, label='loss')
plt.plot(test_curve, label='test_loss', alpha=0.6)
plt.title('Learning curve')
plt.xlabel("Epoch")
plt.ylabel("MSE")
plt.legend()
plt.grid()

### Evaluation ¶

Map output value which is greater than or equal a threshold to 1, othrewise 0. In this tutorial, set the threshold to 0.5. Calculate accuracy, confusion matrix, precision, recall, f1-value using scikit-learn.
Confusion matrix is shown as follows:
[TN FP] [FN TP]
In [10]:
y_pred = model(x_test_std)
y_pred = np.asarray(y_pred)

threshold = 0.5
y_pred_binary = np.floor(y_pred_binary)
y_pred_binary = y_pred_binary.astype(np.int64)

print("accuracy : {}".format(accuracy_score(y_test, y_pred_binary)))
print(confusion_matrix(y_test, y_pred_binary))
print(classification_report(y_test, y_pred_binary))
accuracy : 0.9707752255947498
[[7426  277]
[   8 2041]]
precision    recall  f1-score   support

0       1.00      0.96      0.98      7703
1       0.88      1.00      0.93      2049

avg / total       0.97      0.97      0.97      9752

### ROC Curve ¶

ROC(receiver operating characteristics) curve is one of the way to examine the performance of classifier.
It is drawn by plotting TP and FP while changing the threshold.
(0,0) is the point in the case that classifier predict only 0,
(1,1) is the point in the case that classifier predict only 1.
By looking at this graph, We can decide the threshold while considering the tradeoff about TP and FP.
In [11]:
fpr, tpr, thresholds = roc_curve(y_test, y_pred)

plt.figure(figsize=(10, 4))
plt.plot(fpr, tpr)
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.grid()

### References ¶

[1]
Kemal Tutuncu, Ozcan Cataltas, Murat Koklu, OCCUPANCY DETECTION THROUGH LIGHT, TEMPERATURE, HUMIDITY AND CO2 SENSORS USING ANN, Proceedings of ISER 45th International Conference, 2016