Bike Share prediction

Bike Share prediction model using fully conected neural network having plural units in ouput layer.

In this section, we’ll construct fully-connected neural network to predict the number of two kinds of bike share users in a day from season, weather and so on. Please download the free data from UCI website in advance ( https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset ).

Required libraries

In [2]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

import renom as rm
from renom import Sequential
from renom import Dense, Relu
from renom import Adam

Load & preprocess the data

First of all we’ll load the data. Although two files are contained in the downloaded folder, we’ll use only day.csv because we will predict the number of casual users and registered users in a day in this tutrial.

In [3]:
df = pd.read_csv("../day.csv")

We drop unnecessary columns (‘instant’, ‘dteday’, ‘cnt’) from the dataframe. Regarding ‘dteday’, it is dropped because it can be represented by other parameters (season, yr, mnth, holiday, weekday, workingday) and because of simplifying the preprocess of dataset. However, including dteday may improve the precision of prediction.

In [4]:
df1=df.drop(['instant','dteday','cnt'],axis=1)
df1.head()
Out[4]:
season yr mnth holiday weekday workingday weathersit temp atemp hum windspeed casual registered
0 1 0 1 0 6 0 2 0.344167 0.363625 0.805833 0.160446 331 654
1 1 0 1 0 0 0 2 0.363478 0.353739 0.696087 0.248539 131 670
2 1 0 1 0 1 1 1 0.196364 0.189405 0.437273 0.248309 120 1229
3 1 0 1 0 2 1 1 0.200000 0.212122 0.590435 0.160296 108 1454
4 1 0 1 0 3 1 1 0.226957 0.229270 0.436957 0.186900 82 1518

Now, we standardize the data in each column and convert it to a numpy array.

In [5]:
df_s = df1.copy()

col_std=[]
col_mean=[]
for col in df1.columns:
    v_std = df1[col].std()
    v_mean = df1[col].mean()
    col_std.append(v_std)
    col_mean.append(v_mean)
    df_s[col] = (df_s[col] - v_mean) / v_std

df_s.head()
Out[5]:
season yr mnth holiday weekday workingday weathersit temp atemp hum windspeed casual registered
0 -1.347291 -1.000684 -1.599066 -0.171863 1.497783 -1.470218 1.109667 -0.826097 -0.679481 1.249316 -0.387626 -0.753218 -1.924153
1 -1.347291 -1.000684 -1.599066 -0.171863 -1.495054 -1.470218 1.109667 -0.720601 -0.740146 0.478785 0.749089 -1.044499 -1.913899
2 -1.347291 -1.000684 -1.599066 -0.171863 -0.996248 0.679241 -0.725551 -1.633538 -1.748570 -1.338358 0.746121 -1.060519 -1.555624
3 -1.347291 -1.000684 -1.599066 -0.171863 -0.497441 0.679241 -0.725551 -1.613675 -1.609168 -0.263001 -0.389562 -1.077996 -1.411417
4 -1.347291 -1.000684 -1.599066 -0.171863 0.001365 0.679241 -0.725551 -1.466410 -1.503941 -1.340576 -0.046275 -1.115863 -1.370398

Split data

We will split the data for training and test from dataset.

In [6]:
X, y = np.array(df_s.iloc[:, :11]), np.array(df_s.iloc[:, 11:13])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

Definition of a neural network with sequential model

A fully connected neural network is constructed because all parameters seems to have an effect on the the number of casual users and registerd users. The output layer has two unit because predicted objects are two. The number of units in other layers are hyperparameters. Refering the tutrial of Hyperparameter search is helpful to decide the number of units.

In [7]:
sequential = Sequential([
    Dense(10),
    Relu(),
    Dense(8),
    Relu(),
    Dense(6),
    Relu(),
    Dense(2)
])

Training loop for count of user regression

In [8]:
# parameters
BATCH = 10
EPOCH = 100
optimizer = Adam(lr=0.01)

# Learning curves
learning_curve = []
test_curve = []

# Training loop
for i in range(1, 1+EPOCH):
    N = X_train.shape[0] # Number of records in training data
    perm = np.random.permutation(N)
    train_loss = 0

    for j in range(N//BATCH):
        # Make mini-batch
        index = perm[j*BATCH:(j+1)*BATCH]
        train_batch_x = X_train[index]
        train_batch_y = y_train[index]

        # Forward propagation
        with sequential.train():
            z = sequential(train_batch_x)
            loss = rm.mean_squared_error(z, train_batch_y)

        # Backpropagation
        grad = loss.grad()

        # Update
        grad.update(optimizer)

        train_loss += loss.as_ndarray()

    # calculate mean squared error for training data
    train_loss = train_loss / (N // BATCH)
    learning_curve.append(train_loss)

    # calculate mean squared error for testidation data
    y_test_pred = sequential(X_test)
    test_loss = rm.mean_squared_error(y_test_pred, y_test).as_ndarray()
    test_curve.append(test_loss)

    # print training progress
    if i % 10 == 0:
        print("Epoch %d - loss: %f - test_loss: %f" % (i, train_loss, test_loss))

print('Finished!')
Epoch 10 - loss: 0.135388 - test_loss: 0.131142
Epoch 20 - loss: 0.119636 - test_loss: 0.117048
Epoch 30 - loss: 0.108519 - test_loss: 0.118423
Epoch 40 - loss: 0.105684 - test_loss: 0.140581
Epoch 50 - loss: 0.099742 - test_loss: 0.130865
Epoch 60 - loss: 0.099561 - test_loss: 0.127347
Epoch 70 - loss: 0.098927 - test_loss: 0.122917
Epoch 80 - loss: 0.095182 - test_loss: 0.140498
Epoch 90 - loss: 0.100924 - test_loss: 0.156761
Epoch 100 - loss: 0.093405 - test_loss: 0.150348
Finished!

Model evaluation

Plot learning curve

At first, let’s plot the learning curve to confirm whether the model has learned properly.

In [9]:
plt.figure(figsize=(10, 4))
plt.plot(learning_curve, label='train_loss')
plt.plot(test_curve, label='test_loss', alpha=0.6)
plt.title('Learning curve')
plt.xlabel("Epoch")
plt.ylabel("MSE")
plt.ylim(0, 1)
plt.legend()
plt.grid()
../../../_images/notebooks_regression_bikeshare_notebook_18_0.png

From above figure, we can figure out the test loss curve gets to deviate from the train loss curve. This means the model gets to overfit the train data, thererfore leanring is stopped before the test loss deviate far from train loss. This overfitting is likely to occur when the number of dataset is small.

Compare the actual and predicted count of users

Next, we would like to compare the actual data with predicted data.

In [10]:
# predict test value
y_pred = sequential(X_test)

casual_true = y_test[:,:1].reshape(-1, 1) * col_std[11] + col_mean[11]
casual_pred = y_pred[:,:1] * col_std[11] + col_mean[11]
registered_true = y_test[:,1:2].reshape(-1, 1) * col_std[12] + col_mean[12]
registered_pred = y_pred[:,1:2] * col_std[12] + col_mean[12]

plt.figure(figsize=(8, 8))
plt.plot([5, 8000], [5, 8000], c='k', alpha=0.6, label = 'diagonal line') # diagonal line
plt.scatter(casual_true, casual_pred,label='casual')
plt.scatter(registered_true, registered_pred,label='registered')
plt.xlim(0, 8000)
plt.ylim(0, 8000)
plt.xlabel('acutual count of users', fontsize=16)
plt.ylabel('predicted count of users', fontsize=16)
plt.legend()
plt.grid()
../../../_images/notebooks_regression_bikeshare_notebook_21_0.png

The graph’s x-axis is actual count of users and y-axis is predicted count of uses. The black line is diagonal line (y=x) and the closer to it the plots are, the better the prediction is. From the above graph, we can find that the number of users around 1000 to 3000 couldn’t be predicted successfully. In order to predict more accurately, it is necessary to separate into successfully and not successfully predicted data and investigate the cause.

Root mean squared error

The root mean squared error represents the average loss between original data and the predicted data. In this case, our prediction model has averagely about 208 error for casual users per day and about 377 error for registered users per day.

In [11]:
print("Root mean squared error:{}".format(np.sqrt(rm.mse(casual_true, casual_pred))))
print("Root mean squared error:{}".format(np.sqrt(rm.mse(registered_true, registered_pred))))
Root mean squared error:208.06089782714844
Root mean squared error:377.4617004394531