Activation Function Types

In this tutorial, we will show the types of activation functions that are in ReNom.

In the previous tutorial, we’ve explained that neural network is a function that consist of multiple neurons connect with activation function existing in each neuron. By using a non-linear function as activation function, non-linear system can be learned.

In this tutorial, we will show the activations function inside ReNom. We will also post graphs for further understanding. The input range is from -3 to 3. Note that activation function differs whether it is functional or sequential model. We will use functional model type activation function to plot the graph.

Required Libraries

In [1]:
import renom as rm
import numpy as np
import matplotlib.pyplot as plt

Create Input Data

In [6]:
data=[x/100 for x in range(-300,300)]
input1=np.array(data)

#for confirmation
print(input1[0:5])
[-3.   -2.99 -2.98 -2.97 -2.96]

Defining Graph Function

Because we will be showing graphs for every activation function, we will difine a plotting function.

In [12]:
def plt_act(x,ymin=-3,ymax=3):
    plt.grid()
    plt.plot(input1,x)
    plt.xlim(-3,3)
    plt.ylim(ymin,ymax)
    plt.xlabel('input')
    plt.ylabel('output')
    plt.show()

Sigmoid Function

This activation function outputs the value from 0 to 1. The gradient is continuous. The center point is 0.5, so when calculating the back propagation, the weight changes largely near input 0. The left graph shows the y scale of sigmoid from -3 to 3, while the right shows the y scale from 0 to 1.

f(x) = \frac{1}{1 + \exp(-x)}
In [14]:
output1=rm.sigmoid(input1) # Sequential ->rm.Sigmoid()
plt_act(output1,0,1) ## for easy view
../../../_images/notebooks_basic_algorithm_activation_types_notebook_9_0.png

Tanh Function

This activation function outputs the values from -1 to 1. The gradient is continuous. The center point is 0, so when calculating the back propagation, the weight changes can be supressed near input 0.

f(x) = \tanh(x)
In [17]:
output2=rm.tanh(input1) # Sequential ->rm.Tanh()
plt_act(output2)
../../../_images/notebooks_basic_algorithm_activation_types_notebook_11_0.png

Relu Function

This function outputs the same value as the input when inputs are positive, and outputs 0 when inputs are negative. The gradient is non-continuous but has a constant gradient.

f(x)=\max(x, 0)
In [20]:
output3=rm.relu(input1) # Sequential ->rm.Relu()
plt_act(output3)
../../../_images/notebooks_basic_algorithm_activation_types_notebook_13_0.png

Leaky Relu Function

This function outputs the same value as the input when inputs are positive, and outputs proportional value when inputs are negative. The gradient is non-continuous but has a constant gradient. The default value of the gradient(slope) is 0.01.

\begin{split}f(x)=\begin{cases} x&(x>0)\\ 0&(x\leq0) \end{cases}\\ \Leftrightarrow f(x)=\max(x, 0)+\min(slope*x, 0)\end{split}
In [22]:
output4=rm.leaky_relu(input1) # Sequential ->rm.Leaky_Relu()
plt_act(output4)
../../../_images/notebooks_basic_algorithm_activation_types_notebook_15_0.png

The gradient can also be configured. The bottom shows a gradient of 0.5.

In [29]:
output5=rm.leaky_relu(input1,0.5)
plt_act(output5)
../../../_images/notebooks_basic_algorithm_activation_types_notebook_17_0.png

[1] Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng (2014). Rectifier Nonlinearities Improve Neural Network Acoustic Models

Elu Function

This function outputs the same value as the input when inputs are positive, and outputs value over -1 when inputs are negative. The gradient is non-continuous but has a constant gradient when inputs are positive and a gradient that has a linear relation with the output. The default value for the constant ( \alpha ) is 0.01.

\begin{split}f(x)=\begin{cases} x&(x>0)\\ \alpha(\exp(x)-1)&(x\leq0) \end{cases}\\ \Leftrightarrow f(x)=\max(x, 0) + \alpha*\min(\exp(x)-1, 0)\end{split}
In [24]:
output6=rm.elu(input1) # Sequential ->rm.Elu()
plt_act(output6)
../../../_images/notebooks_basic_algorithm_activation_types_notebook_20_0.png

\alpha can also be configured. The bottom shows when \alpha is 0.5.

In [26]:
output7=rm.elu(input1,0.5)
plt_act(output7)
../../../_images/notebooks_basic_algorithm_activation_types_notebook_22_0.png

[2] Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter (2015). Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). Published as a conference paper at ICLR 2016

Selu Function

This function outputs a proportional value to \lambda when inputs are positive, and outputs product value of \lambda and \alpha when inputs are negative. The gradient is non-continuous but has a constant gradient when inputs are positive and a gradient that has a linear relation with the output. The difference between Elu is Selu has a fixed value for the constants.

\begin{split}f(x)=\lambda \begin{cases} x&(x>0)\\ \alpha(\exp(x)-1)&(x\leq0) \end{cases}\\ \Leftrightarrow f(x)=\lambda\{\max(x, 0) + \alpha*\min(\exp(x)-1, 0)\}\end{split}
\begin{split}\alpha=1.6732632423543772848170429916717\\\lambda=1.0507009873554804934193349852946\end{split}
In [30]:
output8=rm.selu(input1) # Sequential ->rm.Selu()
plt_act(output8)
../../../_images/notebooks_basic_algorithm_activation_types_notebook_25_0.png

[3] Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter. Self-Normalizing Neural Networks. Learning (cs.LG); Machine Learning

Softmax Function

The activation functions explained above are functions calculated over the sum of the variables (inputs), but Softmax calculates over each variable. The variables are standardized by calculating each variable over exponential functions as nominators and the sum of exponential values of all variables.

f(x_j)=\frac{\exp(x_j)}{\sum_{i}\exp(x_i)}
In [34]:
x = np.random.rand(1, 3)
z = rm.softmax(x) # Sequential ->rm.Softmax()
print("The inputs are: ",x)
print("The outputs are: ",z)
The inputs are:  [[0.84523073 0.47026356 0.54554521]]
The outputs are:  [[0.41180003 0.28303504 0.30516493]]