Fully Convolutional Networks for Semantic Segmentation

This chapter introduces Fully Convolutional Network for Semantic Segmentation.

The figure[1] bellow shows the concise architecture of Fully Convolutional networks for Semantic Segmentation. The Semantic Segmentation is pixel-pixel classification. Through convolution layers and deconvolution layers(upsampling), an output image whose channels represent one hot vector, in other words, the outpus shape is (the number of class, height, width) is generated. The output image can be seen at the bottom of this page.

Required libraries

In this tutorial, following modules are required.

  • matplotlib 2.0.2
  • numpy 1.12.1
  • skimage 0.13.1
  • scipy 1.0.0
  • tqdm 4.19.5
  • cv2(opencv-python) 3.3.0.10
In [1]:
import renom as rm
import numpy as np
import skimage.io as io
import matplotlib.pyplot as plt
from random import shuffle
import scipy.ndimage
import scipy.misc
import skimage.transform
import skimage
from tqdm import tqdm
from renom.cuda import set_cuda_active
import cv2

GPU-enabled Computing

If you wish to use GPUs, you need to call the set_cuda_active() with the argument True . This makes training much faster than training with only CPUs. Before calling this function, you need to make sure if you have GPU on your machine.

In [2]:
set_cuda_active(True)
In [3]:
classes = ["sky", "building", "pole", "road", "pavement",
                          "tree", "sign_symbol", "fence", "car", "pedestrian", "bicyclist"]
nb_class = len(classes) + 1

Load dataset and Preprocessing

In this chapter, we use Camvid dataset which can be downloaded from http://docs.renom.jp/downloads/CamVid.zip . After dowonloading CamVid datasets, you need to normalize the image data as following. The label data you downloaded has class number allocated to each pixel, thus you need to transform it into one hot vector.

The data shape of image data and label must be respectively (number_of_images, channel(3), height, width) and (number of images, number_of_classes(12), height, width). To run the following program, you have to substitute appropriate path to data_path variable.

In [4]:
data_path = './CamVid/'

def normalized(rgb):
    norm=np.zeros((rgb.shape[0], rgb.shape[1], 3),np.float32)

    b=rgb[:,:,0]
    g=rgb[:,:,1]
    r=rgb[:,:,2]

    norm[:,:,0]= b
    norm[:,:,1]= g
    norm[:,:,2]= r
    mean = norm.mean()
    std = norm.std()
    norm -= mean
    norm /= std

    return norm

def one_hot_it(labels,w,h):
    x = np.zeros([w,h,12])
    for i in range(w):
        for j in range(h):
            x[i,j,labels[i][j]]=1
    return x

def load_data(mode):
    data = []
    label = []
    with open(data_path + mode +'.txt') as f:
        txt = f.readlines()
        txt = [line.split(' ') for line in txt]
    for i in range(len(txt)):
        data.append(np.rollaxis(normalized(cv2.imread(data_path+txt[i][0][15:])[136:,256:]),2))
        label.append(one_hot_it(cv2.imread(data_path + txt[i][1][15:][:-1])[136:,256:][:,:,0],224,224))
    return np.array(data), np.array(label)

train_data, train_label = load_data("train")
test_data, test_label = load_data("test")
val_data, val_label = load_data("val")

train_label = np.transpose(train_label, (0, 3, 1, 2))

test_label = np.transpose(test_label, (0, 3, 1, 2))

val_label = np.transpose(val_label, (0, 3, 1, 2))

FCN 32s

Jonathan Et al. suggested three types of model: FCN 32s, FCN 16s, and FCN 8s. Here, we introduce FCN 32s, and the others later. FCN 32s is not actually better than the others. However, this is the simplest model amongst those. The FCN-segmentation consists of three basic components: VGG16, Fully Convolutional Layer, Deconvolutional Layer. You can use other typical neural network architecture such as AlexNet and GoogLeNet instead of VGG16, though VGG16 is the most accurate amongs those accorgin the thesis. FCN 32S upsamples features derived from fully convolutional neural network. Because the image size is decreased by 1/32 through VGG16, the deconvolutional layer has to increase its image size by 32.

In [5]:
class FCN_32s(rm.Model):
    def __init__(self, nb_class):
        self.conv1_1 = rm.Conv2d(64, padding=1, filter=3)
        self.conv1_2 = rm.Conv2d(64, padding=1, filter=3)
        self.max_pool1 = rm.MaxPool2d(filter=2, stride=2)

        self.conv2_1 = rm.Conv2d(128, padding=1, filter=3)
        self.conv2_2 = rm.Conv2d(128, padding=1, filter=3)
        self.max_pool2 = rm.MaxPool2d(filter=2, stride=2)

        self.conv3_1 = rm.Conv2d(256, padding=1, filter=3)
        self.conv3_2 = rm.Conv2d(256, padding=1, filter=3)
        self.conv3_3 = rm.Conv2d(256, padding=1, filter=3)
        self.max_pool3 = rm.MaxPool2d(filter=2, stride=2)

        self.conv4_1 = rm.Conv2d(512, padding=1, filter=3)
        self.conv4_2 = rm.Conv2d(512, padding=1, filter=3)
        self.conv4_3 = rm.Conv2d(512, padding=1, filter=3)
        self.max_pool4 = rm.MaxPool2d(filter=2, stride=2)

        self.conv5_1 = rm.Conv2d(512, padding=1, filter=3)
        self.conv5_2 = rm.Conv2d(512, padding=1, filter=3)
        self.conv5_3 = rm.Conv2d(512, padding=1, filter=3)
        self.max_pool5 = rm.MaxPool2d(filter=2, stride=2)

        self.fc6 = rm.Conv2d(4096, filter=7, padding=3)
        self.fc7 = rm.Conv2d(4096, filter=1)

        self.score_fr = rm.Conv2d(nb_class, filter=1) #nb_classes
        self.upscore = rm.Deconv2d(nb_class, stride=32, padding=0, filter=32) #nb_classes

    def forward(self, x):
        t = x
        t = rm.relu(self.conv1_1(t))
        t = rm.relu(self.conv1_2(t))
        t = self.max_pool1(t)

        t = rm.relu(self.conv2_1(t))
        t = rm.relu(self.conv2_2(t))
        t = self.max_pool2(t)

        t = rm.relu(self.conv3_1(t))
        t = rm.relu(self.conv3_2(t))
        t = rm.relu(self.conv3_3(t))
        t = self.max_pool3(t)

        t = rm.relu(self.conv4_1(t))
        t = rm.relu(self.conv4_2(t))
        t = rm.relu(self.conv4_3(t))
        t = self.max_pool4(t)

        t = rm.relu(self.conv5_1(t))
        t = rm.relu(self.conv5_2(t))
        t = rm.relu(self.conv5_3(t))
        t = self.max_pool5(t)

        t = rm.relu(self.fc6(t))
        fc6 = t

        t = rm.relu(self.fc7(t))
        fc7 = t

        t = self.score_fr(t)
        score_fr = t
        t = self.upscore(t)
        return t

FCN 16s

This model does not only upsample features derived from fully convolutional network, but it also upsample the features coming from the pool4 layer in VGG16. The size of feature maps after passing the pool4 layer is 1/16 of the original size. Therefore, the model firstly upsample features whose size is 1/32 of the original size gained by the fully convlutional network to obtain the upsampled features whose size is 1/16. Later, the upsample features and the features gained by the pool4 layer are combined. After this process, we again need to upsample the combined feature and obtain the feature whose size is the same as the original image.

In [6]:

class FCN_16s(rm.Model):
    def __init__(self, nb_class):
        self.conv1_1 = rm.Conv2d(64, padding=1, filter=3)
        self.conv1_2 = rm.Conv2d(64, padding=1, filter=3)
        self.max_pool1 = rm.MaxPool2d(filter=2, stride=2)

        self.conv2_1 = rm.Conv2d(128, padding=1, filter=3)
        self.conv2_2 = rm.Conv2d(128, padding=1, filter=3)
        self.max_pool2 = rm.MaxPool2d(filter=2, stride=2)

        self.conv3_1 = rm.Conv2d(256, padding=1, filter=3)
        self.conv3_2 = rm.Conv2d(256, padding=1, filter=3)
        self.conv3_3 = rm.Conv2d(256, padding=1, filter=3)
        self.max_pool3 = rm.MaxPool2d(filter=2, stride=2)

        self.conv4_1 = rm.Conv2d(512, padding=1, filter=3)
        self.conv4_2 = rm.Conv2d(512, padding=1, filter=3)
        self.conv4_3 = rm.Conv2d(512, padding=1, filter=3)
        self.max_pool4 = rm.MaxPool2d(filter=2, stride=2)

        self.conv5_1 = rm.Conv2d(512, padding=1, filter=3)
        self.conv5_2 = rm.Conv2d(512, padding=1, filter=3)
        self.conv5_3 = rm.Conv2d(512, padding=1, filter=3)
        self.max_pool5 = rm.MaxPool2d(filter=2, stride=2)

        self.fc6 = rm.Conv2d(4096, filter=7, padding=3)
        self.fc7 = rm.Conv2d(4096, filter=1)

        self.score_fr = rm.Conv2d(nb_class, filter=1)
        self.score_pool4 = rm.Conv2d(nb_class, filter=1)

        self.upscore2 = rm.Deconv2d(nb_class, filter=2, stride=2, padding=0)
        self.upscore16 = rm.Deconv2d(nb_class, filter=16, stride=16, padding=0)

    def forward(self, x):
        t = x
        t = rm.relu(self.conv1_1(t))
        t = rm.relu(self.conv1_2(t))
        t = self.max_pool1(t)

        t = rm.relu(self.conv2_1(t))
        t = rm.relu(self.conv2_2(t))
        t = self.max_pool2(t)

        t = rm.relu(self.conv3_1(t))
        t = rm.relu(self.conv3_2(t))
        t = rm.relu(self.conv3_3(t))
        t = self.max_pool3(t)

        t = rm.relu(self.conv4_1(t))
        t = rm.relu(self.conv4_2(t))
        t = rm.relu(self.conv4_3(t))
        t = self.max_pool4(t)
        pool4 = t

        t = rm.relu(self.conv5_1(t))
        t = rm.relu(self.conv5_2(t))
        t = rm.relu(self.conv5_3(t))
        t = self.max_pool5(t)
        t = rm.relu(self.fc6(t))

        t = rm.relu(self.fc7(t))

        t = self.score_fr(t)

        t = self.upscore2(t)
        upscore2 = t

        t = self.score_pool4(pool4)
        score_pool4 = t

        t = upscore2 + score_pool4
        fuse_pool4 = t
        t = self.upscore16(fuse_pool4)
        upscore16 = t

        return t

FCN 8s

The FCN 8s is more accurate and complicated model than the models we introduced so far. Unlike the FCN 16s, this model needs to combine the features coming from the pool3 layer and the upsampled features. The basic structure of this model is the same as the FCN 16s.

In [7]:
class FCN_8s(rm.Model):
    def __init__(self, nb_class):
        self.conv1_1 = rm.Conv2d(64, padding=1, filter=3)
        self.conv1_2 = rm.Conv2d(64, padding=1, filter=3)
        self.max_pool1 = rm.MaxPool2d(filter=2, stride=2)

        self.conv2_1 = rm.Conv2d(128, padding=1, filter=3)
        self.conv2_2 = rm.Conv2d(128, padding=1, filter=3)
        self.max_pool2 = rm.MaxPool2d(filter=2, stride=2)

        self.conv3_1 = rm.Conv2d(256, padding=1, filter=3)
        self.conv3_2 = rm.Conv2d(256, padding=1, filter=3)
        self.conv3_3 = rm.Conv2d(256, padding=1, filter=3)
        self.max_pool3 = rm.MaxPool2d(filter=2, stride=2)

        self.conv4_1 = rm.Conv2d(512, padding=1, filter=3)
        self.conv4_2 = rm.Conv2d(512, padding=1, filter=3)
        self.conv4_3 = rm.Conv2d(512, padding=1, filter=3)
        self.max_pool4 = rm.MaxPool2d(filter=2, stride=2)

        self.conv5_1 = rm.Conv2d(512, padding=1, filter=3)
        self.conv5_2 = rm.Conv2d(512, padding=1, filter=3)
        self.conv5_3 = rm.Conv2d(512, padding=1, filter=3)
        self.max_pool5 = rm.MaxPool2d(filter=2, stride=2)

        self.fc6 = rm.Conv2d(4096, filter=7, padding=3)
        self.fc7 = rm.Conv2d(4096, filter=1)

        self.drop_out = rm.Dropout(0.5)

        self.score_fr = rm.Conv2d(nb_class, filter=1)
        self.upscore2 = rm.Deconv2d(nb_class, filter=2, stride=2, padding=0)
        self.upscore8 = rm.Deconv2d(nb_class, filter=8, stride=8, padding=0)

        self.score_pool3 = rm.Conv2d(nb_class, filter=1)
        self.score_pool4 = rm.Conv2d(nb_class, filter=1)

        self.upscore_pool4 = rm.Deconv2d(nb_class, filter=2, stride=2, padding=0)

    def forward(self, x):
        t = x
        t = rm.relu(self.conv1_1(t))
        t = rm.relu(self.conv1_2(t))
        t = self.max_pool1(t) #112

        t = rm.relu(self.conv2_1(t))
        t = rm.relu(self.conv2_2(t))
        t = self.max_pool2(t) #56

        t = rm.relu(self.conv3_1(t))
        t = rm.relu(self.conv3_2(t))
        t = rm.relu(self.conv3_3(t))
        t = self.max_pool3(t) #28
        pool3 = t

        t = rm.relu(self.conv4_1(t))
        t = rm.relu(self.conv4_2(t))
        t = rm.relu(self.conv4_3(t))
        t = self.max_pool4(t)
        pool4 = t

        t = rm.relu(self.conv5_1(t))
        t = rm.relu(self.conv5_2(t))
        t = rm.relu(self.conv5_3(t))
        t = self.max_pool5(t)

        t = rm.relu(self.fc6(t))
        t = self.drop_out(t)
        fc6 = t

        t = rm.relu(self.fc7(t))
        fc7 = t

        t = self.score_fr(t)
        score_fr = t

        t = self.upscore2(t)
        upscore2 = t

        t = self.score_pool4(pool4)
        score_pool4 = t

        t = upscore2 + score_pool4
        fuse_pool4 = t

        t = self.score_pool3(pool3)
        score_pool3 = t


        t = self.upscore_pool4(fuse_pool4)
        upscore_pool4 = t
        t = upscore_pool4 + score_pool3

        t = self.upscore8(t)
        return t

Training

Here, we train the FCN-Segmentation model defined above. Because the FCN 8s is more accurate than others, we will train the FCN 8s model. We use SGD(Stochastic Gradient Descent) and set appropriate parameter: learning rate and momentum. The softmax cross entropy is used to calculate the training loss. The training loss computed by softmax_cross_entropy must be devided by the number of pixels(height * width) to gain appropriate loss values.

In [8]:
epochs = 100
batch = 4

opt = rm.Sgd(lr=0.07, momentum=0.6)
fcn_8s = FCN_8s(nb_class)
N = len(train_data)
val_N = len(val_data)
for epoch in range(epochs):
    bar = tqdm(range(N//batch))
    loss = 0
    val_loss = 0
    perm = np.random.permutation(N)
    for j in range(N//batch):
        with fcn_8s.train():
            x = train_data[perm[j*batch:(j+1)*batch]]
            y = train_label[perm[j*batch:(j+1)*batch]]
            t = fcn_8s(x)
            n, c, _, _ = y.shape
            l= rm.softmax_cross_entropy(t, y) /(224*224)
        l.grad().update(opt)
        bar.set_description("epoch {:03d} train loss:{:6.4f} ".format(epoch, float(l.as_ndarray())))
        bar.update(1)
        loss += l.as_ndarray()
    perm = np.random.permutation(val_N)
    for k in range(val_N//batch):
        x = val_data[perm[k*batch:(k+1)*batch]]
        y = val_label[perm[k*batch:(k+1)*batch]]
        t = fcn_8s(x)
        val_l = rm.softmax_cross_entropy(t, y) /(224*224)
        val_loss += val_l.as_ndarray()

    bar.set_description("epoch {:03d} avg loss:{:6.4f}  val loss:{:6.4f}".format(epoch, float((loss/(j+1))), float((val_loss/(k+1)))))
    bar.update(0)
    bar.refresh()
    bar.close()

epoch 000 avg loss:1.7988  val loss:1.7596: 100%|██████████| 91/91 [00:35<00:00,  3.83it/s]
epoch 001 avg loss:1.4687  val loss:1.4669: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 002 avg loss:1.2169  val loss:1.2556: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 003 avg loss:1.0366  val loss:1.1187: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 004 avg loss:0.9566  val loss:1.1117: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 005 avg loss:0.9045  val loss:1.0804: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 006 avg loss:0.9111  val loss:1.0125: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 007 avg loss:0.8482  val loss:1.0826: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 008 avg loss:0.8075  val loss:0.9494: 100%|██████████| 91/91 [00:28<00:00,  3.78it/s]
epoch 009 avg loss:0.7946  val loss:1.0489: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 010 avg loss:0.7847  val loss:0.9769: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 011 avg loss:0.7394  val loss:0.9108: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 012 avg loss:0.7041  val loss:0.8540: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 013 avg loss:0.6912  val loss:0.8300: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 014 avg loss:0.6667  val loss:0.8063: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 015 avg loss:0.6361  val loss:0.7545: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 016 avg loss:0.6142  val loss:0.8146: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 017 avg loss:0.5951  val loss:0.7702: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 018 avg loss:0.5878  val loss:0.7128: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 019 avg loss:0.5624  val loss:0.7256: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 020 avg loss:0.5378  val loss:0.7454: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 021 avg loss:0.5155  val loss:0.7294: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 022 avg loss:0.5039  val loss:0.6613: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 023 avg loss:0.4856  val loss:0.6604: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 024 avg loss:0.4682  val loss:0.7065: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 025 avg loss:0.4532  val loss:0.6689: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 026 avg loss:0.4439  val loss:0.6410: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 027 avg loss:0.4220  val loss:0.7369: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 028 avg loss:0.4176  val loss:0.6738: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 029 avg loss:0.4122  val loss:0.7332: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 030 avg loss:0.3945  val loss:0.6556: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 031 avg loss:0.3823  val loss:0.6234: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 032 avg loss:0.3661  val loss:0.6219: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 033 avg loss:0.3498  val loss:0.5856: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 034 avg loss:0.3702  val loss:0.5731: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 035 avg loss:0.3351  val loss:0.6623: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 036 avg loss:0.3204  val loss:0.6047: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 037 avg loss:0.5538  val loss:0.7946: 100%|██████████| 91/91 [00:29<00:00,  3.81it/s]
epoch 038 avg loss:0.4943  val loss:0.5994: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 039 avg loss:0.3593  val loss:0.6236: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 040 avg loss:0.3269  val loss:0.5875: 100%|██████████| 91/91 [00:29<00:00,  3.81it/s]
epoch 041 avg loss:0.3014  val loss:0.5864: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 042 avg loss:0.2882  val loss:0.5849: 100%|██████████| 91/91 [00:29<00:00,  3.81it/s]
epoch 043 avg loss:0.2728  val loss:0.5895: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 044 avg loss:0.2657  val loss:0.6378: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 045 avg loss:0.2562  val loss:0.6108: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 046 avg loss:0.2511  val loss:0.5802: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 047 avg loss:0.2414  val loss:0.6362: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 048 avg loss:0.2315  val loss:0.6301: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 049 avg loss:0.2311  val loss:0.6258: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 050 avg loss:0.2215  val loss:0.5937: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 051 avg loss:0.2220  val loss:0.5399: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 052 avg loss:0.2337  val loss:0.5946: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 053 avg loss:0.2083  val loss:0.5648: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 054 avg loss:0.2022  val loss:0.6478: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 055 avg loss:0.1971  val loss:0.5885: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 056 avg loss:0.1954  val loss:0.5818: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 057 avg loss:0.1948  val loss:0.5819: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 058 avg loss:0.1869  val loss:0.5888: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 059 avg loss:0.1841  val loss:0.5593: 100%|██████████| 91/91 [00:28<00:00,  3.78it/s]
epoch 060 avg loss:0.1811  val loss:0.5796: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 061 avg loss:0.1792  val loss:0.5846: 100%|██████████| 91/91 [00:29<00:00,  3.81it/s]
epoch 062 avg loss:0.1743  val loss:0.5601: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 063 avg loss:0.1730  val loss:0.5667: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 064 avg loss:0.1722  val loss:0.6380: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 065 avg loss:0.1693  val loss:0.5749: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 066 avg loss:0.1672  val loss:0.6221: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 067 avg loss:0.1610  val loss:0.5924: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 068 avg loss:0.1603  val loss:0.5951: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 069 avg loss:0.1583  val loss:0.6380: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 070 avg loss:0.1568  val loss:0.5593: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 071 avg loss:0.1556  val loss:0.6054: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 072 avg loss:0.1497  val loss:0.5532: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 073 avg loss:0.1506  val loss:0.5587: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 074 avg loss:0.1484  val loss:0.5887: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 075 avg loss:0.1458  val loss:0.5633: 100%|██████████| 91/91 [00:30<00:00,  3.80it/s]
epoch 076 avg loss:0.1463  val loss:0.5873: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 077 avg loss:0.1430  val loss:0.5890: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 078 avg loss:0.1407  val loss:0.5771: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 079 avg loss:0.1414  val loss:0.5884: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 080 avg loss:0.1376  val loss:0.5737: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 081 avg loss:0.1372  val loss:0.6153: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 082 avg loss:0.1346  val loss:0.5572: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 083 avg loss:0.1359  val loss:0.5667: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]
epoch 084 avg loss:0.1336  val loss:0.5819: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 085 avg loss:0.1324  val loss:0.6527: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]
epoch 086 avg loss:0.1355  val loss:0.5951: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]
epoch 087 avg loss:0.1296  val loss:0.5726: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 088 avg loss:0.1273  val loss:0.5819: 100%|██████████| 91/91 [00:26<00:00,  3.80it/s]
epoch 089 avg loss:0.1290  val loss:0.5495: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 090 avg loss:0.1261  val loss:0.5938: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 091 avg loss:0.1263  val loss:0.5843: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 092 avg loss:0.1222  val loss:0.5643: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]
epoch 093 avg loss:0.1212  val loss:0.5970: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]
epoch 094 avg loss:0.1211  val loss:0.5674: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 095 avg loss:0.1191  val loss:0.6220: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]
epoch 096 avg loss:0.1200  val loss:0.5647: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 097 avg loss:0.1201  val loss:0.5720: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 098 avg loss:0.1206  val loss:0.5765: 100%|██████████| 91/91 [00:26<00:00,  3.80it/s]
epoch 099 avg loss:0.1165  val loss:0.5854: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]

Visualization

We check the trained model by visualizing segmented image. We need to define color maps for each class.

In [10]:
Sky = [128,128,128]
Building = [128,0,0]
Pole = [192,192,128]
Road_marking = [255,69,0]
Road = [128,64,128]
Pavement = [60,40,222]
Tree = [128,128,0]
SignSymbol = [192,128,128]
Fence = [64,64,128]
Car = [64,0,128]
Pedestrian = [64,64,0]
Bicyclist = [0,128,192]
Unlabelled = [0,0,0]

label_colours = np.array([Sky, Building, Pole, Road, Pavement,
                          Tree, SignSymbol, Fence, Car, Pedestrian, Bicyclist, Unlabelled])

def visualize(temp, plot=True):
    print(temp.__class__)
    r = temp.copy()
    g = temp.copy()
    b = temp.copy()
    for l in range(0,11):
        r[temp==l]=label_colours[l,0]
        g[temp==l]=label_colours[l,1]
        b[temp==l]=label_colours[l,2]

    rgb = np.zeros((temp.shape[0], temp.shape[1], 3))
    rgb[:,:,0] = (r/255.0)#[:,:,0]
    rgb[:,:,1] = (g/255.0)#[:,:,1]
    rgb[:,:,2] = (b/255.0)#[:,:,2]
    if plot:
        plt.imshow(rgb)
    else:
        return rgb

Fetching Original Images

To compare segmented images with original images, we will collect the original images.

In [15]:
gt = []
with open(data_path + 'val' +'.txt') as f:
    txt = f.readlines()
    txt = [line.split(' ') for line in txt]
for i in range(len(txt)):
    gt.append(cv2.imread(data_path+txt[i][0][15:])[136:,256:])


Prediction

In [16]:
pred = fcn_8s(val_data[0:1])
pred = pred.as_ndarray()

Original Image

Displaying the original Image.

In [17]:
plt.imshow(gt[0]/255.0)
plt.show()
../../../_images/notebooks_image_processing_fcn-segmentation_notebook_23_0.png

Segmented Image

Displaying the segmented image.

In [18]:
segmented_img = visualize(np.argmax(pred[0],axis=0).reshape((224,224)), False)
plt.imshow(segmented_img)
plt.show()
<class &apos;numpy.ndarray&apos;>
../../../_images/notebooks_image_processing_fcn-segmentation_notebook_25_1.png
<matplotlib.figure.Figure at 0x7f8a820d4e80>

References

[1] Jonathan Long Evan Shelhamer Trevor Darrell, Fully Convolutional Networks for Semantic Segmentation