Semantic SegmentationのためのFully Convolutional Networks

本章では画像のピクセルをクラスごとに分類するFully Convolutional Networksを紹介します。

下の画像はFully Convoluotional Networksの簡潔な構造を示しています。Semantic Segmentationはピクセルごとにクラス分類を行うタスクです。複数の畳み込み層と逆畳み込み層を通して、出力画像のチャンネル数がクラス数、つまりOne Hot Vectorとなるようにします。よって最終的な出力結果の形は (クラス数、高さ、幅) となります。出力画像はこのページの下で確認することができます。

必要なライブラリ

この章では以下のモジュールが必要です。

  • matplotlib 2.0.2
  • numpy 1.12.1
  • skimage 0.13.1
  • scipy 1.0.0
  • tqdm 4.19.5
  • cv2(opencv-python 3.3.0.10
In [1]:
import renom as rm
import numpy as np
import skimage.io as io
import matplotlib.pyplot as plt
from random import shuffle
import scipy.ndimage
import scipy.misc
import skimage.transform
import skimage
from tqdm import tqdm
from renom.cuda import set_cuda_active
import cv2

GPUを用いた計算

GPUを用いて計算を行う場合、 set_cuda_active() を用いて、cudaを有効化します.その際、引数に True を渡す必要があります。

In [2]:
set_cuda_active(True)
In [3]:
classes = ["sky", "building", "pole", "road", "pavement",
                          "tree", "sign_symbol", "fence", "car", "pedestrian", "bicyclist"]
nb_class = len(classes) + 1

データセットの取得と前処理

本章では、Camvidデータセットと呼ばれるものを利用します。このデータセットはhttp://docs.renom.jp/downloads/CamVid.zipでダウンロードをすることができます。ダウンロード後、画像データの正規化をする必要があります。また、ダウンロードしたラベルデータは各ピクセルに対してクラスの数字が割り当てられているので、これをOne Hot Vectorに変換する必要があります。

画像データとラベルデータの形はそれぞれ (画像の枚数, チャンネル, 高さ、幅)、そして(画像の枚数, クラスの数, 高さ, 幅) とする必要があります。以下のプログラムを動かすためには data_path 変数に適切なパスを入れなければなりません。

In [4]:
data_path = './CamVid/'

def normalized(rgb):
    norm=np.zeros((rgb.shape[0], rgb.shape[1], 3),np.float32)

    b=rgb[:,:,0]
    g=rgb[:,:,1]
    r=rgb[:,:,2]

    norm[:,:,0]= b
    norm[:,:,1]= g
    norm[:,:,2]= r
    mean = norm.mean()
    std = norm.std()
    norm -= mean
    norm /= std

    return norm

def one_hot_it(labels,w,h):
    x = np.zeros([w,h,12])
    for i in range(w):
        for j in range(h):
            x[i,j,labels[i][j]]=1
    return x

def load_data(mode):
    data = []
    label = []
    with open(data_path + mode +'.txt') as f:
        txt = f.readlines()
        txt = [line.split(' ') for line in txt]
    for i in range(len(txt)):
        data.append(np.rollaxis(normalized(cv2.imread(data_path+txt[i][0][15:])[136:,256:]),2))
        label.append(one_hot_it(cv2.imread(data_path + txt[i][1][15:][:-1])[136:,256:][:,:,0],224,224))
    return np.array(data), np.array(label)

train_data, train_label = load_data("train")
test_data, test_label = load_data("test")
val_data, val_label = load_data("val")

train_label = np.transpose(train_label, (0, 3, 1, 2))

test_label = np.transpose(test_label, (0, 3, 1, 2))

val_label = np.transpose(val_label, (0, 3, 1, 2))

FCN 32s

JonathanらはFCN 32s, FCN 16s, FCN 8sの3つのパターンのモデルを論文内で紹介しました。ここでは、FCN32sを紹介し、後程残りを説明します。FCN 32sは他のモデルに比べると精度としては良くないが、とても単純な構造になっている。FCN-Segmentatinoは主にVGG16, 全層畳み込みネットワーク(Fully Convolutional Layer)とDeconvolutinnal Networkで成り立っている。ここで、VGG16の代わりにAlexNetやGoogLeNetといったモデルを利用することも可能ですが、VGG16での結果が一番良いという結果から、本章ではVGG16を利用したネットワークを実装します。FCN 32sは全層畳み込みそうから得られた特徴量マップに対してアップサンプリングをするひつようがあります。FCN 32s内のVGG16ネットワークにより特徴量マップのサイズがはじめの画像のサイズの1/32となるために、アップサンプリングをして全層畳み込みから得られた特徴量マップを32倍し出力します。

In [5]:
class FCN_32s(rm.Model):
    def __init__(self, nb_class):
        self.conv1_1 = rm.Conv2d(64, padding=1, filter=3)
        self.conv1_2 = rm.Conv2d(64, padding=1, filter=3)
        self.max_pool1 = rm.MaxPool2d(filter=2, stride=2)

        self.conv2_1 = rm.Conv2d(128, padding=1, filter=3)
        self.conv2_2 = rm.Conv2d(128, padding=1, filter=3)
        self.max_pool2 = rm.MaxPool2d(filter=2, stride=2)

        self.conv3_1 = rm.Conv2d(256, padding=1, filter=3)
        self.conv3_2 = rm.Conv2d(256, padding=1, filter=3)
        self.conv3_3 = rm.Conv2d(256, padding=1, filter=3)
        self.max_pool3 = rm.MaxPool2d(filter=2, stride=2)

        self.conv4_1 = rm.Conv2d(512, padding=1, filter=3)
        self.conv4_2 = rm.Conv2d(512, padding=1, filter=3)
        self.conv4_3 = rm.Conv2d(512, padding=1, filter=3)
        self.max_pool4 = rm.MaxPool2d(filter=2, stride=2)

        self.conv5_1 = rm.Conv2d(512, padding=1, filter=3)
        self.conv5_2 = rm.Conv2d(512, padding=1, filter=3)
        self.conv5_3 = rm.Conv2d(512, padding=1, filter=3)
        self.max_pool5 = rm.MaxPool2d(filter=2, stride=2)

        self.fc6 = rm.Conv2d(4096, filter=7, padding=3)
        self.fc7 = rm.Conv2d(4096, filter=1)

        self.score_fr = rm.Conv2d(nb_class, filter=1) #nb_classes
        self.upscore = rm.Deconv2d(nb_class, stride=32, padding=0, filter=32) #nb_classes

    def forward(self, x):
        t = x
        t = rm.relu(self.conv1_1(t))
        t = rm.relu(self.conv1_2(t))
        t = self.max_pool1(t)

        t = rm.relu(self.conv2_1(t))
        t = rm.relu(self.conv2_2(t))
        t = self.max_pool2(t)

        t = rm.relu(self.conv3_1(t))
        t = rm.relu(self.conv3_2(t))
        t = rm.relu(self.conv3_3(t))
        t = self.max_pool3(t)

        t = rm.relu(self.conv4_1(t))
        t = rm.relu(self.conv4_2(t))
        t = rm.relu(self.conv4_3(t))
        t = self.max_pool4(t)

        t = rm.relu(self.conv5_1(t))
        t = rm.relu(self.conv5_2(t))
        t = rm.relu(self.conv5_3(t))
        t = self.max_pool5(t)

        t = rm.relu(self.fc6(t))
        fc6 = t

        t = rm.relu(self.fc7(t))
        fc7 = t

        t = self.score_fr(t)
        score_fr = t
        t = self.upscore(t)
        return t

FCN 16s

このモデルはFully Convoloutional Networkから特徴量をアップサンプリングするだけではなく、VGG16内のpool4から得られる特徴量に対してもアップサンプリングを行います。pool4後の特徴量マップのサイズは元の画像サイズの1/16ですので、始めにFully Convolutional Networkから得られる元の画像サイズの1/16の特徴量に対してアップサンプリングを行い、元の画像サイズの1/16の特徴量マップを得る必要があります。その後、この特徴量マップとpool4から得られる特徴量マップを結合します。最後に、元の画像の1/16のサイズである特徴量マップをさらにアップサンプリングを行い、元画像と同じサイズの画像を出力します。

In [6]:

class FCN_16s(rm.Model):
    def __init__(self, nb_class):
        self.conv1_1 = rm.Conv2d(64, padding=1, filter=3)
        self.conv1_2 = rm.Conv2d(64, padding=1, filter=3)
        self.max_pool1 = rm.MaxPool2d(filter=2, stride=2)

        self.conv2_1 = rm.Conv2d(128, padding=1, filter=3)
        self.conv2_2 = rm.Conv2d(128, padding=1, filter=3)
        self.max_pool2 = rm.MaxPool2d(filter=2, stride=2)

        self.conv3_1 = rm.Conv2d(256, padding=1, filter=3)
        self.conv3_2 = rm.Conv2d(256, padding=1, filter=3)
        self.conv3_3 = rm.Conv2d(256, padding=1, filter=3)
        self.max_pool3 = rm.MaxPool2d(filter=2, stride=2)

        self.conv4_1 = rm.Conv2d(512, padding=1, filter=3)
        self.conv4_2 = rm.Conv2d(512, padding=1, filter=3)
        self.conv4_3 = rm.Conv2d(512, padding=1, filter=3)
        self.max_pool4 = rm.MaxPool2d(filter=2, stride=2)

        self.conv5_1 = rm.Conv2d(512, padding=1, filter=3)
        self.conv5_2 = rm.Conv2d(512, padding=1, filter=3)
        self.conv5_3 = rm.Conv2d(512, padding=1, filter=3)
        self.max_pool5 = rm.MaxPool2d(filter=2, stride=2)

        self.fc6 = rm.Conv2d(4096, filter=7, padding=3)
        self.fc7 = rm.Conv2d(4096, filter=1)

        self.score_fr = rm.Conv2d(nb_class, filter=1)
        self.score_pool4 = rm.Conv2d(nb_class, filter=1)

        self.upscore2 = rm.Deconv2d(nb_class, filter=2, stride=2, padding=0)
        self.upscore16 = rm.Deconv2d(nb_class, filter=16, stride=16, padding=0)

    def forward(self, x):
        t = x
        t = rm.relu(self.conv1_1(t))
        t = rm.relu(self.conv1_2(t))
        t = self.max_pool1(t)

        t = rm.relu(self.conv2_1(t))
        t = rm.relu(self.conv2_2(t))
        t = self.max_pool2(t)

        t = rm.relu(self.conv3_1(t))
        t = rm.relu(self.conv3_2(t))
        t = rm.relu(self.conv3_3(t))
        t = self.max_pool3(t)

        t = rm.relu(self.conv4_1(t))
        t = rm.relu(self.conv4_2(t))
        t = rm.relu(self.conv4_3(t))
        t = self.max_pool4(t)
        pool4 = t

        t = rm.relu(self.conv5_1(t))
        t = rm.relu(self.conv5_2(t))
        t = rm.relu(self.conv5_3(t))
        t = self.max_pool5(t)
        t = rm.relu(self.fc6(t))

        t = rm.relu(self.fc7(t))

        t = self.score_fr(t)

        t = self.upscore2(t)
        upscore2 = t

        t = self.score_pool4(pool4)
        score_pool4 = t

        t = upscore2 + score_pool4
        fuse_pool4 = t
        t = self.upscore16(fuse_pool4)
        upscore16 = t

        return t

FCN 8s

FCN 8s モデルは今まで紹介したFCN-Segmentationモデルの中で一番精度が高く、かつ複雑なモデルである。FCN16sとは違い、このモデルはVGG16内のpool3から得られる特徴量マップに対してもアップサンプリングを行います。しかし基本的な流れや構造はFCN 16sと同じです。

In [7]:
class FCN_8s(rm.Model):
    def __init__(self, nb_class):
        self.conv1_1 = rm.Conv2d(64, padding=1, filter=3)
        self.conv1_2 = rm.Conv2d(64, padding=1, filter=3)
        self.max_pool1 = rm.MaxPool2d(filter=2, stride=2)

        self.conv2_1 = rm.Conv2d(128, padding=1, filter=3)
        self.conv2_2 = rm.Conv2d(128, padding=1, filter=3)
        self.max_pool2 = rm.MaxPool2d(filter=2, stride=2)

        self.conv3_1 = rm.Conv2d(256, padding=1, filter=3)
        self.conv3_2 = rm.Conv2d(256, padding=1, filter=3)
        self.conv3_3 = rm.Conv2d(256, padding=1, filter=3)
        self.max_pool3 = rm.MaxPool2d(filter=2, stride=2)

        self.conv4_1 = rm.Conv2d(512, padding=1, filter=3)
        self.conv4_2 = rm.Conv2d(512, padding=1, filter=3)
        self.conv4_3 = rm.Conv2d(512, padding=1, filter=3)
        self.max_pool4 = rm.MaxPool2d(filter=2, stride=2)

        self.conv5_1 = rm.Conv2d(512, padding=1, filter=3)
        self.conv5_2 = rm.Conv2d(512, padding=1, filter=3)
        self.conv5_3 = rm.Conv2d(512, padding=1, filter=3)
        self.max_pool5 = rm.MaxPool2d(filter=2, stride=2)

        self.fc6 = rm.Conv2d(4096, filter=7, padding=3)
        self.fc7 = rm.Conv2d(4096, filter=1)

        self.drop_out = rm.Dropout(0.5)

        self.score_fr = rm.Conv2d(nb_class, filter=1)
        self.upscore2 = rm.Deconv2d(nb_class, filter=2, stride=2, padding=0)
        self.upscore8 = rm.Deconv2d(nb_class, filter=8, stride=8, padding=0)

        self.score_pool3 = rm.Conv2d(nb_class, filter=1)
        self.score_pool4 = rm.Conv2d(nb_class, filter=1)

        self.upscore_pool4 = rm.Deconv2d(nb_class, filter=2, stride=2, padding=0)

    def forward(self, x):
        t = x
        t = rm.relu(self.conv1_1(t))
        t = rm.relu(self.conv1_2(t))
        t = self.max_pool1(t) #112

        t = rm.relu(self.conv2_1(t))
        t = rm.relu(self.conv2_2(t))
        t = self.max_pool2(t) #56

        t = rm.relu(self.conv3_1(t))
        t = rm.relu(self.conv3_2(t))
        t = rm.relu(self.conv3_3(t))
        t = self.max_pool3(t) #28
        pool3 = t

        t = rm.relu(self.conv4_1(t))
        t = rm.relu(self.conv4_2(t))
        t = rm.relu(self.conv4_3(t))
        t = self.max_pool4(t)
        pool4 = t

        t = rm.relu(self.conv5_1(t))
        t = rm.relu(self.conv5_2(t))
        t = rm.relu(self.conv5_3(t))
        t = self.max_pool5(t)

        t = rm.relu(self.fc6(t))
        t = self.drop_out(t)
        fc6 = t

        t = rm.relu(self.fc7(t))
        fc7 = t

        t = self.score_fr(t)
        score_fr = t

        t = self.upscore2(t)
        upscore2 = t

        t = self.score_pool4(pool4)
        score_pool4 = t

        t = upscore2 + score_pool4
        fuse_pool4 = t

        t = self.score_pool3(pool3)
        score_pool3 = t


        t = self.upscore_pool4(fuse_pool4)
        upscore_pool4 = t
        t = upscore_pool4 + score_pool3

        t = self.upscore8(t)
        return t

学習

ここでは、上で定義したFCN-Segmentationモデルの学習を行う。FCN 8sが最も精度が良いということから、FCN 8sの学習を行います。学習の際に、optimizerとして確率的勾配降下法(SGD: Stochastic Gradient Descent)を利用します。学習率とモーメンタムを引数として持ちます。また、誤差は、 softmax_cross_entropy を利用して計算され、その値を全ピクセル数(高さ * 幅)で割らなければなりません。

In [8]:
epochs = 100
batch = 4

opt = rm.Sgd(lr=0.07, momentum=0.6)
fcn_8s = FCN_8s(nb_class)
N = len(train_data)
val_N = len(val_data)
for epoch in range(epochs):
    bar = tqdm(range(N//batch))
    loss = 0
    val_loss = 0
    perm = np.random.permutation(N)
    for j in range(N//batch):
        with fcn_8s.train():
            x = train_data[perm[j*batch:(j+1)*batch]]
            y = train_label[perm[j*batch:(j+1)*batch]]
            t = fcn_8s(x)
            n, c, _, _ = y.shape
            l= rm.softmax_cross_entropy(t, y) /(224*224)
        l.grad().update(opt)
        bar.set_description("epoch {:03d} train loss:{:6.4f} ".format(epoch, float(l.as_ndarray())))
        bar.update(1)
        loss += l.as_ndarray()
    perm = np.random.permutation(val_N)
    for k in range(val_N//batch):
        x = val_data[perm[k*batch:(k+1)*batch]]
        y = val_label[perm[k*batch:(k+1)*batch]]
        t = fcn_8s(x)
        val_l = rm.softmax_cross_entropy(t, y) /(224*224)
        val_loss += val_l.as_ndarray()

    bar.set_description("epoch {:03d} avg loss:{:6.4f}  val loss:{:6.4f}".format(epoch, float((loss/(j+1))), float((val_loss/(k+1)))))
    bar.update(0)
    bar.refresh()
    bar.close()

epoch 000 avg loss:1.7988  val loss:1.7596: 100%|██████████| 91/91 [00:35<00:00,  3.83it/s]
epoch 001 avg loss:1.4687  val loss:1.4669: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 002 avg loss:1.2169  val loss:1.2556: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 003 avg loss:1.0366  val loss:1.1187: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 004 avg loss:0.9566  val loss:1.1117: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 005 avg loss:0.9045  val loss:1.0804: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 006 avg loss:0.9111  val loss:1.0125: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 007 avg loss:0.8482  val loss:1.0826: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 008 avg loss:0.8075  val loss:0.9494: 100%|██████████| 91/91 [00:28<00:00,  3.78it/s]
epoch 009 avg loss:0.7946  val loss:1.0489: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 010 avg loss:0.7847  val loss:0.9769: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 011 avg loss:0.7394  val loss:0.9108: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 012 avg loss:0.7041  val loss:0.8540: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 013 avg loss:0.6912  val loss:0.8300: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 014 avg loss:0.6667  val loss:0.8063: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 015 avg loss:0.6361  val loss:0.7545: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 016 avg loss:0.6142  val loss:0.8146: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 017 avg loss:0.5951  val loss:0.7702: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 018 avg loss:0.5878  val loss:0.7128: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 019 avg loss:0.5624  val loss:0.7256: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 020 avg loss:0.5378  val loss:0.7454: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 021 avg loss:0.5155  val loss:0.7294: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 022 avg loss:0.5039  val loss:0.6613: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 023 avg loss:0.4856  val loss:0.6604: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 024 avg loss:0.4682  val loss:0.7065: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 025 avg loss:0.4532  val loss:0.6689: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 026 avg loss:0.4439  val loss:0.6410: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 027 avg loss:0.4220  val loss:0.7369: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 028 avg loss:0.4176  val loss:0.6738: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 029 avg loss:0.4122  val loss:0.7332: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 030 avg loss:0.3945  val loss:0.6556: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 031 avg loss:0.3823  val loss:0.6234: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 032 avg loss:0.3661  val loss:0.6219: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 033 avg loss:0.3498  val loss:0.5856: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 034 avg loss:0.3702  val loss:0.5731: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 035 avg loss:0.3351  val loss:0.6623: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 036 avg loss:0.3204  val loss:0.6047: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 037 avg loss:0.5538  val loss:0.7946: 100%|██████████| 91/91 [00:29<00:00,  3.81it/s]
epoch 038 avg loss:0.4943  val loss:0.5994: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 039 avg loss:0.3593  val loss:0.6236: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 040 avg loss:0.3269  val loss:0.5875: 100%|██████████| 91/91 [00:29<00:00,  3.81it/s]
epoch 041 avg loss:0.3014  val loss:0.5864: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 042 avg loss:0.2882  val loss:0.5849: 100%|██████████| 91/91 [00:29<00:00,  3.81it/s]
epoch 043 avg loss:0.2728  val loss:0.5895: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 044 avg loss:0.2657  val loss:0.6378: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 045 avg loss:0.2562  val loss:0.6108: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 046 avg loss:0.2511  val loss:0.5802: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 047 avg loss:0.2414  val loss:0.6362: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 048 avg loss:0.2315  val loss:0.6301: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 049 avg loss:0.2311  val loss:0.6258: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 050 avg loss:0.2215  val loss:0.5937: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 051 avg loss:0.2220  val loss:0.5399: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 052 avg loss:0.2337  val loss:0.5946: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 053 avg loss:0.2083  val loss:0.5648: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 054 avg loss:0.2022  val loss:0.6478: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 055 avg loss:0.1971  val loss:0.5885: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 056 avg loss:0.1954  val loss:0.5818: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 057 avg loss:0.1948  val loss:0.5819: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 058 avg loss:0.1869  val loss:0.5888: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 059 avg loss:0.1841  val loss:0.5593: 100%|██████████| 91/91 [00:28<00:00,  3.78it/s]
epoch 060 avg loss:0.1811  val loss:0.5796: 100%|██████████| 91/91 [00:28<00:00,  3.81it/s]
epoch 061 avg loss:0.1792  val loss:0.5846: 100%|██████████| 91/91 [00:29<00:00,  3.81it/s]
epoch 062 avg loss:0.1743  val loss:0.5601: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 063 avg loss:0.1730  val loss:0.5667: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 064 avg loss:0.1722  val loss:0.6380: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 065 avg loss:0.1693  val loss:0.5749: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 066 avg loss:0.1672  val loss:0.6221: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 067 avg loss:0.1610  val loss:0.5924: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 068 avg loss:0.1603  val loss:0.5951: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 069 avg loss:0.1583  val loss:0.6380: 100%|██████████| 91/91 [00:29<00:00,  3.80it/s]
epoch 070 avg loss:0.1568  val loss:0.5593: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 071 avg loss:0.1556  val loss:0.6054: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 072 avg loss:0.1497  val loss:0.5532: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 073 avg loss:0.1506  val loss:0.5587: 100%|██████████| 91/91 [00:28<00:00,  3.79it/s]
epoch 074 avg loss:0.1484  val loss:0.5887: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 075 avg loss:0.1458  val loss:0.5633: 100%|██████████| 91/91 [00:30<00:00,  3.80it/s]
epoch 076 avg loss:0.1463  val loss:0.5873: 100%|██████████| 91/91 [00:28<00:00,  3.80it/s]
epoch 077 avg loss:0.1430  val loss:0.5890: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 078 avg loss:0.1407  val loss:0.5771: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 079 avg loss:0.1414  val loss:0.5884: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 080 avg loss:0.1376  val loss:0.5737: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 081 avg loss:0.1372  val loss:0.6153: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 082 avg loss:0.1346  val loss:0.5572: 100%|██████████| 91/91 [00:29<00:00,  3.79it/s]
epoch 083 avg loss:0.1359  val loss:0.5667: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]
epoch 084 avg loss:0.1336  val loss:0.5819: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 085 avg loss:0.1324  val loss:0.6527: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]
epoch 086 avg loss:0.1355  val loss:0.5951: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]
epoch 087 avg loss:0.1296  val loss:0.5726: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 088 avg loss:0.1273  val loss:0.5819: 100%|██████████| 91/91 [00:26<00:00,  3.80it/s]
epoch 089 avg loss:0.1290  val loss:0.5495: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 090 avg loss:0.1261  val loss:0.5938: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 091 avg loss:0.1263  val loss:0.5843: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 092 avg loss:0.1222  val loss:0.5643: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]
epoch 093 avg loss:0.1212  val loss:0.5970: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]
epoch 094 avg loss:0.1211  val loss:0.5674: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 095 avg loss:0.1191  val loss:0.6220: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]
epoch 096 avg loss:0.1200  val loss:0.5647: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 097 avg loss:0.1201  val loss:0.5720: 100%|██████████| 91/91 [00:26<00:00,  3.78it/s]
epoch 098 avg loss:0.1206  val loss:0.5765: 100%|██████████| 91/91 [00:26<00:00,  3.80it/s]
epoch 099 avg loss:0.1165  val loss:0.5854: 100%|██████████| 91/91 [00:26<00:00,  3.79it/s]

可視化

セグメンテーションされた画像を表示し、学習済みモデルの確認を行います。その際、それぞれのクラスに対応するカラーマップを定義する必要があります。

In [10]:
Sky = [128,128,128]
Building = [128,0,0]
Pole = [192,192,128]
Road_marking = [255,69,0]
Road = [128,64,128]
Pavement = [60,40,222]
Tree = [128,128,0]
SignSymbol = [192,128,128]
Fence = [64,64,128]
Car = [64,0,128]
Pedestrian = [64,64,0]
Bicyclist = [0,128,192]
Unlabelled = [0,0,0]

label_colours = np.array([Sky, Building, Pole, Road, Pavement,
                          Tree, SignSymbol, Fence, Car, Pedestrian, Bicyclist, Unlabelled])

def visualize(temp, plot=True):
    print(temp.__class__)
    r = temp.copy()
    g = temp.copy()
    b = temp.copy()
    for l in range(0,11):
        r[temp==l]=label_colours[l,0]
        g[temp==l]=label_colours[l,1]
        b[temp==l]=label_colours[l,2]

    rgb = np.zeros((temp.shape[0], temp.shape[1], 3))
    rgb[:,:,0] = (r/255.0)#[:,:,0]
    rgb[:,:,1] = (g/255.0)#[:,:,1]
    rgb[:,:,2] = (b/255.0)#[:,:,2]
    if plot:
        plt.imshow(rgb)
    else:
        return rgb

元画像の取得

セグメンテーションされた画像と元画像を比較するために、元画像データを集めます。

In [15]:
gt = []
with open(data_path + 'val' +'.txt') as f:
    txt = f.readlines()
    txt = [line.split(' ') for line in txt]
for i in range(len(txt)):
    gt.append(cv2.imread(data_path+txt[i][0][15:])[136:,256:])


推論

In [16]:
pred = fcn_8s(val_data[0:1])
pred = pred.as_ndarray()

元画像

元画像の表示

In [17]:
plt.imshow(gt[0]/255.0)
plt.show()
../../../_images/notebooks_image_processing_fcn-segmentation_notebook_23_0.png

セグメンテーションされた画像

セグメンテーション後の画像の表示

In [18]:
segmented_img = visualize(np.argmax(pred[0],axis=0).reshape((224,224)), False)
plt.imshow(segmented_img)
plt.show()
<class 'numpy.ndarray'>
../../../_images/notebooks_image_processing_fcn-segmentation_notebook_25_1.png
<matplotlib.figure.Figure at 0x7f8a820d4e80>

参照

[1] Jonathan Long Evan Shelhamer Trevor Darrell, Fully Convolutional Networks for Semantic Segmentation[1] Jonathan Long Evan Shelhamer Trevor Darrell, Fully Convolutional