DAY 45-100 DAYS MLCODE: Convolutional Neural Networks (CNN)

In the previous blogs we have discussed the DNN and how to use pre-trained model for training, in this blog, we’ll discuss Convolutional Neural Networks ( CNN). As per the Wikipedia:

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery.
Convolutional networks were inspired by biological processes^[4] in that the connectivity pattern between neurons resembles the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.

The idea of the CNN was derived from the biological process about how visual cortex structured and works. As per the study of David H. and Torsten Wheel, may neurons in the visual cortex have a local receptive field that means neurons will respond to stimuli only in the restricted region, and receptive fields of all neurons combine together to make the whole visual image. This was also found that some neurons react only to image horizontal lines.

The above study inspired the paper Neocognitron in 1980 and which later evolved into Convolutional Neural networks ( CNN).

Convolutional Layer

Convolutional Layer is a most important building block of CNN architecture. Unlike DNN which we have developed in the previous blog, here neurons of a first convolutional layer are not connected to every single pixel of an input image. Neurons in first Convolutional layers only connected to the pixel of receptive fields. Same apply to the next level as shown in image below:

Filters (Receptive field)

A neurons weight is represented with small image size same as receptive fields. In a fully connected layer, each neuron receives input from every element of the previous layer. However, In a convolutional layer, neurons receive input from only a restricted subarea of the previous layer. Typically the subarea is of a square shape (e.g., size 5 by 5). The input area of a neuron is called its receptive field. These filters can be Vertical or Horizontal

Pooling Layer

As we have seen that Memory size and computational power to calculate the number of parameters is very critical and pooling helps to reduce that. Pooling progressively reduces the spatial size of the representation to reduce the amounts of parameters and computation in the network, and hence to also control overfitting. Polling combines the outputs of neuron clusters at one layer into a single neuron in the next layer. Pooling does not have weight and it is aggregate of input and can be Mean or MAX . We generally have to define the Size, Stride and the padding type.

CNN Architecture

A typical CNN Architecture looks like below. It consists of few convolutional layers each layers followed by Relu then pooling layer and then again few convolutional layer and then pooling layers and so on. Images gets smaller as they pass through the network . At the top of network we can have a fully connected layer and the final output of the model.

Let’s create a simple example of CNN using TensorFlow.

Load the MNIST data

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(“/tmp/data/”)

Output:
Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz

Let’s declare all global variables

height = 28 #MNIST data has size of 28 * 28
width = 28 #MNIST data has size of 28 * 28
channels = 1 # Black and White image so has 1 channel
n_inputs = height * width #Input is same as total number of pixels
#First convolutional Layer
conv1_fmaps = 32 #Convolutional Layer 1 – Filter
conv1_ksize = 3 #Convolutional Layer 1 – Kernal size ( Filter)
conv1_stride = 1 #Stride
conv1_pad = “SAME” #Padding

#Second convolutional Layer
conv2_fmaps = 64 #Convolutional Layer 1 – Filter
conv2_ksize = 3 #Convolutional Layer 1 – Kernal size ( Filter)
conv2_stride = 2 #Stride
conv2_pad = “SAME” #Padding

pool3_fmaps = conv2_fmaps #Pooling

n_fc1 = 64
n_outputs = 10 #Output

Construct the model – Input Layers and Convolutional Layers

with tf.name_scope(“inputs”):
# Input for the Model
X = tf.placeholder(tf.float32, shape=[None, n_inputs], name=”X”)
#Reshape it to two dimensional
X_reshaped = tf.reshape(X, shape=[-1, height, width, channels])
# Lables for the data
y = tf.placeholder(tf.int32, shape=[None], name=”y”)
#1st Convolutional Layer where X_reshapred is input
conv1 = tf.layers.conv2d(X_reshaped, filters=conv1_fmaps, kernel_size=conv1_ksize,
strides=conv1_stride, padding=conv1_pad,
activation=tf.nn.relu, name=”conv1″)
#2nd Convolutional Layer, pass the conv1 as input paramters
conv2 = tf.layers.conv2d(conv1, filters=conv2_fmaps, kernel_size=conv2_ksize,
strides=conv2_stride, padding=conv2_pad,
activation=tf.nn.relu, name=”conv2″)

Add the pooling layers and pass the output of pooling layers to Fully connected layers:

with tf.name_scope(“pool3”):
#Apply the pooling to CONV2
pool3 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=”VALID”)
#Reshape the pooling output so that we can pass to Fully connected layer
pool3_flat = tf.reshape(pool3, shape=[-1, pool3_fmaps * 7 * 7])

with tf.name_scope(“fc1”):
#Fully connected layer before output
fc1 = tf.layers.dense(pool3_flat, n_fc1, activation=tf.nn.relu, name=”fc1″)

Now construct the output layers with softmax

with tf.name_scope(“output”):
#Output Layers
logits = tf.layers.dense(fc1, n_outputs, name=”output”)
#Output for class as probabilities
Y_proba = tf.nn.softmax(logits, name=”Y_proba”)

Apply the sparse_softmax_cross_entropy_with_logits and apply the Adam optimizer to optimize the loss

with tf.name_scope(“train”):
#Apply the cross entropy for loss calculation and Adam Optimizer to optimize the loss
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y)
loss = tf.reduce_mean(xentropy)
optimizer = tf.train.AdamOptimizer()
training_op = optimizer.minimize(loss)

Evaluate the model

with tf.name_scope(“eval”):
#Evaluate the Model and calculate the accuracy
correct = tf.nn.in_top_k(logits, y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

Create a saver to save the model and initialize the variable before start of training the model.

with tf.name_scope(“init_and_save”):
#Initialize the saver and gloval variables before training starts
init = tf.global_variables_initializer()
saver = tf.train.Saver()

Now train the model and print the train and test accuracy

n_epochs = 10
batch_size = 100

with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
for iteration in range(mnist.train.num_examples // batch_size):
X_batch, y_batch = mnist.train.next_batch(batch_size)
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
acc_test = accuracy.eval(feed_dict={X: mnist.test.images, y: mnist.test.labels})
print(epoch, “Train accuracy:”, acc_train, “Test accuracy:”, acc_test)

save_path = saver.save(sess, “./my_mnist_model”)

Output:
0 Train accuracy: 0.96 Test accuracy: 0.9768
1 Train accuracy: 0.97 Test accuracy: 0.9837
2 Train accuracy: 1.0 Test accuracy: 0.9868
3 Train accuracy: 1.0 Test accuracy: 0.9868
4 Train accuracy: 0.97 Test accuracy: 0.9879
5 Train accuracy: 1.0 Test accuracy: 0.9891
6 Train accuracy: 0.99 Test accuracy: 0.9889
7 Train accuracy: 1.0 Test accuracy: 0.9881
8 Train accuracy: 0.99 Test accuracy: 0.9895
9 Train accuracy: 1.0 Test accuracy: 0.9884

Our model has achieved a test accuracy of 98.84 which is very good.

In conclusion, CNN is based on the how visual cortex works and with recent developments helps to achieve the results better than the human. You can find the today’s code here.

#100DaysofMLCode #Classifier #MNISTDATASET Classificaiton CNN tensorflow

DAY 45-100 DAYS MLCODE: Convolutional Neural Networks (CNN)