Introduction to the Loss Function ¶
Introduction to two basic loss funcions, and related activation functions
Content ¶
- Introduction
- Problem classification and basic loss function
- Combinations of activation and loss functions
- How to apply a basic combination with ReNom
1.
Introduction
A loss function calculates the error between a set of outputs and
their labels.
Selecting an appropriate loss function is extremely important. This is
because neural networks differentiate the loss function in order learn
parameters. We’ll now introduce basic loss functions and related
activation functions in this notebook.
2.
Problem classification and basic loss functions
There are many commonly used loss functions, and sometimes we even
define novel loss functions to solve a specific problem.
We’ll now introduce two basic loss functions, cross entropy and mean
squared error, and some related activation functions, sigmoid function
and softmax function.
To choose a loss function, or find a suitable loss-activation
combination, we first have to classify the problem- Binary
classification? Multiclass classification?Regression?
Typically, for each of these problem types, we would use a different
combination.


3.
Combinations of activation and loss function
According to the different problem-types above, there are several
reasons why we might prefer one combination over another.


As described above, cross entropy is usually used for probablistic
output.
Mean squared error is usually used for regression.
Using mean squared error and the softmax(sigmoid) functions together
is not recommended, is this may lead to very slow learning.
4.
How to use basic combinations with ReNom
The reference for the exmaple dataset is below:
Lichman, M. (2013). UCI Machine Learning Repository
[
http://archive.ics.uci.edu/ml
].
Irvine, CA: University of California, School of Information and
Computer Science.
Required Libraries ¶
- matplotlib 2.0.2
- numpy 1.12.1
- scikit-learn 0.18.2
- pandas 0.20.3
In [1]:
from __future__ import division, print_function
import numpy as np
import pandas as pd
import re
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelBinarizer, OneHotEncoder
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import train_test_split
import renom as rm
from renom.optimizer import Sgd, Adam
from renom.cuda import set_cuda_active
set_cuda_active(True)
# If this is the first time running the example,
# and you need to download the data- set this to True
first_time = False
if first_time:
import os
os.system("wget http://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data")
def load_data(filename):
df = pd.read_csv(filename, header=None, index_col=None)
print("the number of {} records:{}".format(filename, len(df.index)))
df = df.applymap(lambda d:np.nan if d=="?" else d)
df = df.dropna(axis=0)
print("the number of {} records after trimming:{}".format(filename, len(df.index)))
sr_labels = df.iloc[:,-1]
labels = sr_labels.str.replace("+","1").replace("-","0").values.astype(float)
data = df.iloc[:,:-1].values.astype(str)
return data, labels
Identify the numerical column or categorical column and onehot vectorize ¶
In [2]:
pattern_continuous = re.compile("^\d+\.?\d*\Z")
def onehot_vectorize(data):
continuous_idx = {}
for i in range(data.shape[1]):
is_continuous = True if pattern_continuous.match(data[0][i]) else False
if is_continuous and i==0:
X = data[:,i].astype(float)
elif not is_continuous and i==0:
X = pd.get_dummies(data[:,i]).values.astype(float)
elif is_continuous and i!=0:
X = np.concatenate((X, data[:,i].reshape(-1,1).astype(float)), axis=1)
elif not is_continuous and i!=0:
X = np.concatenate((X, pd.get_dummies(data[:,i]).values.astype(float)), axis=1)
return X
data, y = load_data("crx.data")
X = onehot_vectorize(data)
print("X:{} y:{}".format(X.shape, y.shape))
the number of crx.data records:690
the number of crx.data records after trimming:653
X:(653, 46) y:(653,)
Data splitting and model definition ¶
In [3]:
indices = np.arange(len(X))
X_train, X_test, y_train, y_test, indices_train, indices_test = \
train_test_split(X, y, indices, test_size=0.2)
y_train = y_train.reshape(-1, 1)
y_test = y_test.reshape(-1, 1)
print("X_train:{} y_train:{} X_test:{} y_test:{}".format(X_train.shape, y_train.shape, X_test.shape, y_test.shape))
sequential = rm.Sequential([
rm.Dense(64),
rm.Relu(),
rm.Dense(32),
rm.Relu(),
rm.Dense(1)
])
X_train:(522, 46) y_train:(522, 1) X_test:(131, 46) y_test:(131, 1)
Learning loop for sigmoid activation and cross entropy ¶
First, we will introduce how to use the sigmoid-cross entropy combination. The following line sets sigmoid as the activation function, and cross entropy as the loss function.
l = rm.sigmoid_cross_entropy(sequential(train_batch), response_batch)
In [4]:
batch_size = 32
epoch = 50
N = len(X_train)
optimizer = Sgd(lr=0.001)
learning_curve = []
test_learning_curve = []
for i in range(epoch):
perm = np.random.permutation(N)
loss = 0
for j in range(0, N // batch_size):
train_batch = X_train[perm[j*batch_size : (j+1)*batch_size]]
response_batch = y_train[perm[j*batch_size : (j+1)*batch_size]]
with sequential.train():
l = rm.sigmoid_cross_entropy(sequential(train_batch), response_batch)
grad = l.grad()
grad.update(optimizer)
loss += l.as_ndarray()
train_loss = loss / (N // batch_size)
test_loss = rm.sigmoid_cross_entropy(sequential(X_test), y_test).as_ndarray()
test_learning_curve.append(test_loss)
learning_curve.append(train_loss)
if i%10 == 0:
print("epoch :{}, train_loss:{}, test_loss:{}".format(i, train_loss, test_loss))
predictions = rm.sigmoid(sequential(X_test)).as_ndarray()
pred = np.array(list(map(lambda d:1 if d>0.5 else 0, predictions))).reshape(-1,1)
print(confusion_matrix(y_test, pred))
print(classification_report(y_test, pred, target_names=["-","+"]))
epoch :0, train_loss:3.272980712354183, test_loss:0.7651634812355042
epoch :10, train_loss:0.6081356462091208, test_loss:0.7066062092781067
epoch :20, train_loss:0.6118309292942286, test_loss:0.6797510385513306
epoch :30, train_loss:0.6028882898390293, test_loss:0.6095903515815735
epoch :40, train_loss:0.6093996576964855, test_loss:0.6735255718231201
[[52 10]
[32 37]]
precision recall f1-score support
- 0.62 0.84 0.71 62
+ 0.79 0.54 0.64 69
avg / total 0.71 0.68 0.67 131