Image Classification Flow and Reference

This chapter introduces the image classification flow and reference to help you understand the image classification.

1. Image Representation

An image can be represented as an array in the programming. Though we see various kind of colors on the image, those colors are actually based on only three colors; Red, Green, and Blue. By using those three basic colors, we can represent all colors. To represent an image as an array, width, height and channel(RGB) must be required. This leads that the shape of the array is (height, width, channel). Now, let us assume that we have an 500×500 image. The height and width are respectively 500, and the channle must be 3(RGB). Therefore, the shape of the array would be (500, 500, 3).

Next, we will show you how to check the shape of an image. We use the imageio module to load the image. As you can see bellow, the size of the given image is 500×353. The shape of the array, thus, becomes (500, 353, 3).


  • numpy 1.12.1
  • imageio 2.2.0
  • matplotlib 2.0.2
In [10]:
%matplotlib inline
import os
import numpy as np
from imageio import imread
import matplotlib.pyplot as plt

img = imread('./000001.jpg')
print("The Image Shape is ", img.shape)
The Image Shape is  (500, 353, 3)

2. Representation of the Training data

When it comes to the classification problem, we only have one class for each image. To train the neural network model, though we can assign a class number such as 1, 2,… N to each image, we rather use a vector called One Hot Vector whose size is 1 × #Class as a target data to train the model efficiently and more precisely. The vector consists of 0, and one of the element is assigned 1 to identify the class. For example, we assume that we have 3 classes: Dog, Cat and Bird. Instead of assigning 0(Dog), 1(Cat), 2(Bird) to images, we will represent the class using 1 × 3 One Hot Vector. Now, we want to assign 1 to an element of the Vector. If a class of the given image is Cat, then we will assign 1 to the vector at the index 1. The vector, therefore, becomes [0, 1, 0]. By converting the class numbers to one hot vector, the nueural network model can be efficiently trained.

3. Augmentation

To enhance the traning of the neural network model, the neural network model requires a lot of images. However, we do not usually have enough data set to train a model effectively or precisely. Therefore, especially for the complex tasks, the data augmentation process would be required. The augmentation methods are implemented in Renom, and explained in other chapters. We recommend you to read following chapters to understand data augmentation process.




Color Jitter


4. Training

For computer vision tasks, such as the image classification, Convolutional Layer and Pooling Layer are commonly used in a model. Because of those complex theory, we avoid explaining those explanation in this chapter, but rather in other chapters. Moreover, other important methods; such as Optizmizer, Model Save and so on are explained in other chapters as well. We recommend you to refer the following links.

Convolutional Neural Network