Practical Image Classification with Tensorflow
Solving machine learning problems with numerical and string data is fairly old & a lot of work has been done around it for-example even excel has powerful regression functionality which works very well while dealing with numbers but when it comes to data like images & videos which is hard to represent,deep learning and neural networks really come to help.
Image Classification
Working with images is fairly new to the world of machine learning whether it's data engineering or data science. Machines can outmatch humans when it comes to numbers but image recognition is something our brains makes really easy. For us humans it isn't very hard to distinguish between objects but it can be a very challenging to solve this problem using a ML model.
In past years, field of machine learning has matured & put a tremendous amount of work in addressing such problems. We have found a model called convolutional neural network which in some cases can match & also surpass human performance in some domains.
Tensor Flow
Tensorflow is a very powerful numerical computation framework open sourced by Google in Nov-2015, it represents any application as a series of steps as a Directed Acyclic Graph(DAG) which makes it very suitable on building ML applications like neural networks.
Image recognition is a start up problem when comes to tensorflow. In other words it is a hello world example when working on an image recognition software.
Basics of working with Images
Following are the basics you need to understand while working with images.
- Representing images as Tensors
- Encoding tensors to represent colored as well as grey-scale images
- Image operations like cropping, resizing & transposing.
Basic idea of image recognition is to detect what an image represents, each image is a collection of smaller rectangular units called pixels, which contain a wealth of information, they are indeed a form of encoding. Pixels are basically the features in our feature vector during image recognition problem solution.
MNIST dataset
We will be working on the MNIST as our image classification dataset. MNIST contain a large number of images & where each image represent a hand written digit all these images are pre-formatted & processed making it easy to use as neural network training data set example in our applications without worrying about hefty processing.It contains a total training set of 60000 digits & 10000 test digits.We will be solving the classification task and try to recognize the actual digit from its handwritten representation
Each image is in grey scale meaning it has only 1 channel, with standard size of 28*28 pixels giving in total 784 pixels/image. Each image is divided into a grid & every cell represents a single value from 0(black) - 1(white) . Each image has a label associated to it which tells us what digit it actually represents.
K-nearest-neighbors Algorithm
In general lets talk about types of ML algorithms first
Supervised : There are labels associated with training data which in effect is used to correct the algorithm while the training step. Objective is to find or approximate a mapping function of a variable y given an input variable x i.e y=f(x).
Unsupervised: Your model has to be setup right, the model is responsible of understanding structure & patterns within data , there is no training step involved. Algorithms have to self discover the patterns & structure in the data.
KNN is an example of supervised ML algorithm.
- It uses the whole training data-set as its model.
- Every element has in training data has a label.
- Training data is used to predict if we introduce a new test element.
- Prediction for new sample involves figuring out which element in training data set it is similar to.
Calculating neighbors of a sample
Question is how do we determine that this bit of data is close to some other bit of data.
It is done via using something called distance measures. Distance measure indicate how far a data point is from another . There a a number of ways of computing distance measure like Euclidean distance , Hamming distance, L1 distance & Manhattan distance. These methodologies don't only apply to co-ordinate geometry they can be used for images as they are nothing but metrics with numbers representing pixel values.
A 2D handwritten image can be represented using a 1 dimensional vector like this :
Technical Implementation using L1 distance
We will be using knn python & datalab for this example.
To start working with MNIST let us include some necessary imports:
import numpy as np
import tensorflow as tf
import google.datalab.ml as ml
Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
Store the MNIST data in a folder
mnist = input_data.read_data_sets("mnist_data/", one_hot=True)
The code uses built-in capabilities of TensorFlow to download the dataset locally and load it into the python variable. As a result (if not specified otherwise), the data will be downloaded into the mnist_data/
folder. one_hot=true
means the labels associated with each image are represent via one-hot notation
Our task is to build a classifying neural network with TensorFlow. First, we need set up the architecture, train the network (using training set) and then evaluate the result on the test set.
The following image shows the classification process in our image processing steps:
Retrieve training & test digits in form of tuples.
training_digits, training_labels = mnist.train.next_batch(5000)
test_digits, test_labels = mnist.test.next_batch(200)
Define place holders, converting images to 1D arrays as mentioned above.
training_digits_pl = tf.placeholder("float", [None, 784])
test_digit_pl = tf.placeholder("float", [784])
Nearest Neighbor calculation using L1 distance
l1_distance = tf.abs(tf.add(training_digits_pl, tf.negative(test_digit_pl)))
distance = tf.reduce_sum(l1_distance, axis=1)
Prediction: Get min distance index (Nearest neighbor)
pred = tf.arg_min(distance, 0)
accuracy = 0.
Initialize the session,iterate over all test digits and run the prediction algorithm for all test digits
init = tf.global_variables_initializer()
with tf.Session() as sess:sess.run(init)
# loop over test data
for i in range(len(test_digits)):
# Get nearest neighbor
nn_index = sess.run(pred, \
feed_dict={training_digits_pl: training_digits, test_digit_pl: test_digits[i, :]})
# Get nearest neighbor class label and compare it to its true label
print("Test", i, "Prediction:", np.argmax(training_labels[nn_index]), \
"True Label:", np.argmax(test_labels[i]))
# Calculate accuracy
if np.argmax(training_labels[nn_index]) == np.argmax(test_labels[i]):
accuracy += 1./len(test_digits)
print("Done!")
print("Accuracy:", accuracy)
when np.argmax() is computed on a one-hot notation of a digit it returns the index with max value like this :
Following is the output of my code:
Conclusion
Wuhuuu 92% is pretty good for a first time. Congratulations!! you have just made your first image recognition algorithm using tensorflow in less than an hour feel free to tweak the variables in your code & share the results. Stay Tuned!!, We at Carbonteq can help you extract value out of data.
Please checkout out our Big Data and ERP service page if you are unsure how we can help you.