# Convolutional Neural Network(CNN) ¶

In this chapter, we will introcduce the convolutional neural network(CNN) used in mainly computer vision tasks.

The CNN is generally composed of convolutional layers and pooling layers. We will mainly discuss the convolutional layers and pooling layers. Because the explanation of the convolutional layer involves complex mathematical notations, you can skip some of them if you are not interested in nor familiar to it.

## Convolution ¶

Firstly, we explain about the convolution before we will explain the convolutional layer. The Convolution is a mathematical operation of two function, and can be represented in the form

However, because the image and filter is not continuous, we need to convert this form to discrete notation.

## Convolutional Layer ¶

The convoluitional layer consists of two learnable parameters; weight
and bias, and weight is especially called kernel(filter). Each kernel is
convolved across an input image. Now, let us assume that we have an 2-D
image
**
I
**
and kernel
**
K
**
, then the kenels are convolved as follwing.

Througout this process, the convolutional layer finds optimal kernel which would activate when the layer detects specific features on the image. Thus, the layer will be able to compress the given image and extracts the features from it.

The figure bellow represents the process of the convolutional layer.

## Pooling Layer ¶

As the convolutional layer does, the pooling layer also has small window(kernel). Appying the small window across images, the pooling layer conducts statistical process. The computation of the output shape after the pooling layer can be represented as the same as the convolutional layer.

There are two well-used pooling layers; average pooling layer and max pooling layer. We will introduce those layers bellow.

### Average Pooling ¶

The average pooling layer takes average of pixels in the small window applied to images. The following figure represents the process of the average pooling layer.

## Output Shape ¶

The concept of the output shape for the convolutional layer and the pooling layer is same. The output shape depends on the kernel, padding and stride. The output shape after the convolutional layer can be represented in the form

where W is the length of an image and H is the length of a kernel.

In other words, if the length of the image is 10 and the length of the kernel is 2, the outshape will be 10-2[2/2] × 10-2[2/2] , thus 8 × 8 . However, we often want the output shape to be the same shape as the input shape. To achieve this, the padding technique can be used.

### Padding ¶

The padding is a technique to fill values around the image. Usually,
zero is filled around the image, and it is specifically called
**
zero-padding
**
. The following figure represents the padding technique.
By introducing this technique, the output shape will be the same as the
input shape.

### Stride ¶

There is another frequently used technique called
**
Stride
**
. Though the
kernel usually moves 1 pixel vertically and horizontally across the
image, by setting values more than 1 to the stride parameter, the kernel
filter will move the stride size across the image. For example, if we
set 2 to the stride parameter, then the kernel moves 2 pixel next to the
current state vertically and horizontally. Thus, the output shape will
be the half of the input shape.

Therefore, by introducing the padding and stride, the output shape can be represented in the form

## The convolutional Layer in ReNom ¶

Now that we explained the theory of the convolutional layer, then we
will explain how to use the convolutional layer and the pooling layers
in ReNom. In ReNom, Conv2d class is implemented with the arguments;
**
channel
**
,
**
filter
**
,
**
padding
**
, and
**
stride
**
. Also,
MaxPool2d(AveragePool2d) is implemented with the arguments;
**
filter
**
,
**
padding
**
and
**
stride
**
. The argument of
**
channel
**
determined how
many kernels you use. The filter decides the size of the kernel. The
padding and stride are what we explained so far. We will show you the
usage of the convolutional layer by demonstrating digits classification
tasks.

### Required Libraries ¶

- scikit-learn 0.18.2
- matplotlib 2.0.2
- numpy 1.12.1
- tqdm 4.15.0

```
In [1]:
```

```
import renom as rm
from renom.cuda.cuda import set_cuda_active
import numpy as np
from sklearn.datasets import fetch_mldata
from tqdm import tqdm
from sklearn.preprocessing import LabelBinarizer
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
```

### GPU activation ¶

If you want to boost the training of your model, you need to activate
GPU in your machine. The GPU on your machine can be activated by calling
**
set_cuda_activation
**
method implemented in ReNom.

```
In [2]:
```

```
set_cuda_active(True)
```

### Fetching Data ¶

We will use Mnist data comprising of pictures of digits. Executing the following program, the data set will be downloaded.

```
In [3]:
```

```
mnist = fetch_mldata('MNIST original', data_home='./')
```

### Model Definition ¶

Here, we will define the convoloutional neural network. Because the images in the dataset is not complicated, we do not need to define a complex model. We, thus, define only two convolution, two pooling, and fully connected layers. We can implement the model easily by calling Sequential class in ReNom.

```
In [4]:
```

```
cnn = rm.Sequential([
rm.Conv2d(channel=32, filter=3, padding=1),
rm.Relu(),
rm.Conv2d(channel=64, filter=3, padding=1),
rm.Relu(),
rm.MaxPool2d(filter=2, stride=2),
rm.Dropout(0.5),
rm.Flatten(),
rm.Dense(128),
rm.Relu(),
rm.Dense(10)
])
```

### Data Conversion ¶

We will split the dataset to two groups: training dataset and validation dataset to find an optimal model. Moreover, because the target data has to be one hot vectors, we need to convert the target data.

```
In [5]:
```

```
data = mnist['data']
targets = mnist['target']
train_num = int(0.8 * len(data))
train_data = np.expand_dims(data[:train_num].reshape(train_num, 28, 28), axis=1)
test_data = np.expand_dims(data[train_num:].reshape(len(data) - train_num, 28, 28), axis=1)
train_targets = targets[:train_num]
train_targets = LabelBinarizer().fit_transform(train_targets).astype(np.float32)
test_targets = targets[train_num:]
test_targets = LabelBinarizer().fit_transform(test_targets).astype(np.float32)
```

### Training ¶

```
In [6]:
```

```
batch_size = 64
epochs = 1
optimizer = rm.Sgd(lr=0.001)
N = train_num
for epoch in range(epochs):
perm = np.random.permutation(N)
loss = 0
test_loss = 0
bar = tqdm(range(N//batch_size))
for j in range(N//batch_size):
train_batch = train_data[perm[j*batch_size:(j+1)*batch_size]]
train_targets_batch = train_targets[perm[j*batch_size:(j+1)*batch_size]]
with cnn.train():
l = rm.softmax_cross_entropy(cnn(train_batch), train_targets_batch)
l.grad().update(optimizer)
bar.set_description("epoch {:03d} train loss:{:6.4f} ".format(epoch, float(l.as_ndarray())))
bar.update(1)
loss += l.as_ndarray()
for k in range(len(test_data)//batch_size):
test_batch = test_data[k*batch_size:(k+1)*batch_size]
test_targets_batch = test_targets[k*batch_size:(k+1)*batch_size]
test_l = rm.softmax_cross_entropy(cnn(test_batch), test_targets_batch)
test_loss += test_l.as_ndarray()
bar.set_description("epoch {:03d} avg loss:{:6.4f} val loss:{:6.4f}".format(epoch, float((loss/(j+1))), float((test_loss/(k+1)))))
bar.update(0)
bar.refresh()
bar.close()
```

```
epoch 000 avg loss:0.4712 val loss:0.3454: 100%|██████████| 875/875 [00:16<00:00, 53.02it/s]
```

### Kernel Visualization ¶

As we explain, each kenel in convolutional layers will be convolved across the image. Now, let us show you kernel filters(weight in the convolutional layers) bellow.

```
In [7]:
```

```
W = cnn._layers[0].params.w.as_ndarray()
nb_filter, nb_channel, h, w = W.shape
plt.figure()
for i in range(nb_filter):
im = W[i, 0]
scalar = MinMaxScaler(feature_range=(0, 255))
im = scalar.fit_transform(im)
plt.subplot(4, 8, i+1)
plt.axis('off')
plt.imshow(im, cmap='gray')
```

Moreover, we will show you the comparison between the original image and the images after kernels in the first convolutional layer are convolved across the original images

```
In [8]:
```

```
print('Original Image')
x = test_data[:1]
t = cnn._layers[0](x).as_ndarray()
nb_filter, nb_channel, h, w = t.shape
plt.figure()
plt.imshow(x[0][0], cmap='gray')
plt.show()
```

```
Original Image
```

```
In [9]:
```

```
print('Feature maps after the first convolutional layer')
plt.figure()
for i in range(nb_channel):
im = t[0, i, :, :]
scalar = MinMaxScaler(feature_range=(0, 255))
im = scalar.fit_transform(im)
plt.subplot(4, 8, i+1)
plt.axis('off')
plt.imshow(im, cmap='gray')
plt.show()
```

```
Feature maps after the first convolutional layer
```