Practical Image Classification with Tensorflow

This article demonstrates how to recognize image with tensorflow with around 90% accuracy, using knn and convolutional neural network

Practical Image Classification with Tensorflow

Solving machine learning problems with numerical and string data is fairly old & a lot of work has been done around it for-example even excel has powerful regression functionality which works very well while dealing with numbers but when it comes to data like images & videos which is hard to represent,deep learning and neural networks really come to help.

Image Classification

Working with images is fairly new to the world of machine learning whether it's data engineering or data science. Machines can outmatch humans when it comes to numbers but image recognition is something our brains makes really easy. For us humans it isn't very hard to distinguish between objects but it can be a very challenging to solve this problem using a ML model.

In past years, field of machine learning has matured & put a tremendous amount of work in addressing such problems. We have found a model called convolutional neural network which in some cases can match & also surpass human performance in some domains.

Tensor Flow

Tensorflow is a very powerful numerical computation framework open sourced by Google in Nov-2015, it represents any application as a series of steps as a Directed Acyclic Graph(DAG) which makes it very suitable on building ML applications like neural networks.

Image recognition is a start up problem when comes to tensorflow. In other words it is a hello world example when working on an image recognition software.

Basics of working with Images

Following are the basics you need to understand while working with images.

  • Representing images as Tensors
  • Encoding tensors to represent colored as well as grey-scale images
  • Image operations like cropping, resizing & transposing.

Basic idea of image recognition is to detect what an image represents, each image is a collection of smaller rectangular units called pixels, which contain a wealth of information, they are indeed a form of  encoding. Pixels are basically the features in our feature vector during image recognition problem solution.


MNIST dataset

We will be working on the MNIST as our image classification dataset. MNIST contain a large number of images & where each image represent a hand written digit all these images are pre-formatted & processed making it easy to use as neural network training data set example in our applications without worrying about hefty processing.It contains a total training set of 60000 digits & 10000 test digits.We will be solving the classification task and try to recognize the actual digit from its handwritten representation


Each image is in grey scale meaning it has only 1 channel, with standard size of 28*28 pixels giving in total 784 pixels/image. Each image is divided into a grid & every cell represents a single value from 0(black) - 1(white) . Each image has a label associated to it which tells us what digit it actually represents.


K-nearest-neighbors Algorithm

In general lets talk about types of ML algorithms first

Supervised : There are labels associated with training data which in effect is used to correct the algorithm while the training step. Objective is to find or approximate a mapping function of a variable y given an input variable x i.e y=f(x).

Unsupervised: Your model has to be setup right, the model is responsible of understanding structure & patterns within data , there is no training step involved. Algorithms have to self discover the patterns & structure in the data.

KNN is an example of supervised ML algorithm.

  • It uses the whole training data-set as its model.
  • Every element has in training data has a label.
  • Training data is used to predict if we introduce a new test element.
  • Prediction for new sample involves figuring out which element in training data set it is similar to.

Calculating neighbors of a sample

Question is how do we determine that this bit of data is close to some other bit of data.

It is done via using something called distance measures. Distance measure indicate how far a data point is from another . There a a number of ways of computing distance measure like Euclidean distance , Hamming distance, L1 distance & Manhattan distance. These methodologies don't only apply to co-ordinate geometry they can be used for images as they are nothing but metrics with numbers representing pixel values.

A 2D handwritten image can be represented using a 1 dimensional vector like this :

Technical Implementation using L1 distance

We will be using knn python & datalab for this example.

To start working with MNIST let us include some necessary imports:

import numpy as np
import tensorflow as tf
import as ml

Import MNIST data

from tensorflow.examples.tutorials.mnist import input_data

Store the MNIST data in a folder

mnist = input_data.read_data_sets("mnist_data/", one_hot=True)

The code uses built-in capabilities of TensorFlow to download the dataset locally and load it into the python variable. As a result (if not specified otherwise), the data will be downloaded into the mnist_data/ folder. one_hot=true means the labels associated with each image are represent via one-hot notation

Our task is to build a classifying neural network with TensorFlow. First, we need set up the architecture, train the network (using training set) and then evaluate the result on the test set.

The following image shows the classification process in our image processing steps:

Retrieve training & test digits in form of tuples.

training_digits, training_labels = mnist.train.next_batch(5000)
test_digits, test_labels = mnist.test.next_batch(200)

Define place holders, converting images to 1D arrays as mentioned above.

training_digits_pl = tf.placeholder("float", [None, 784])

test_digit_pl = tf.placeholder("float", [784])

Nearest Neighbor calculation using L1 distance

l1_distance = tf.abs(tf.add(training_digits_pl, tf.negative(test_digit_pl)))

distance = tf.reduce_sum(l1_distance, axis=1)

Prediction: Get min distance index (Nearest neighbor)

pred = tf.arg_min(distance, 0)

accuracy = 0.

Initialize the session,iterate over all test digits and run the prediction algorithm for all test digits

init = tf.global_variables_initializer()
with tf.Session() as
# loop over test data
for i in range(len(test_digits)):
    # Get nearest neighbor
    nn_index =, \
        feed_dict={training_digits_pl: training_digits, test_digit_pl: test_digits[i, :]})

    # Get nearest neighbor class label and compare it to its true label
    print("Test", i, "Prediction:", np.argmax(training_labels[nn_index]), \
        "True Label:", np.argmax(test_labels[i]))

    # Calculate accuracy
    if np.argmax(training_labels[nn_index]) == np.argmax(test_labels[i]):
        accuracy += 1./len(test_digits)

print("Accuracy:", accuracy)

when np.argmax() is computed on a one-hot notation of a digit it returns the index with max value like this :

Following is the output of my code:



Wuhuuu 92% is pretty good for a first time. Congratulations!! you have just made your first image recognition algorithm using tensorflow in less than an hour feel free to tweak the variables in your code & share the results. Stay Tuned!!, We at Carbonteq can help you extract value out of data.

Please checkout out our Big Data and ERP service page if you are unsure how we can help you.