Mnist handwritten digit classification using tensorflow

Milind Soorya / September 16, 2021

8 min read

confusion matrix in machine learning example

Introduction

What is Handwritten Digit Recognition?

Handwritten digit recognition is the ability of computers to recognize human handwritten digits. It is a hard task for the machine because handwritten digits are not perfect and can vary from person to person. Handwritten digit recognition is the solution to this problem which uses the image of a digit and recognizes the digit present in the image.

The MNIST dataset

This is probably one of the most popular datasets among machine learning and deep learning enthusiasts. The MNIST dataset contains 60,000 training images of handwritten digits from zero to nine and 10,000 images for testing. So, the MNIST dataset has 10 different classes. The handwritten digits images are represented as a 28×28 matrix where each cell contains grayscale pixel value.

In this article, we will look at the MNIST dataset and create a simple neural network using TensorFlow and Keras. Later we will also add a hidden layer to make the model more accurate.

TLDR; MNIST handwritten digit classification github

Here for the code? You can find the python Notebook in my GitHub.

Import the modules

import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

Load the MNIST dataset from Keras

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
len(x_train)
# 60000
len(x_test)
# 10000
# Finding the shape of individual sample
x_train[0].shape
# (28, 28)

hence, each sample is a 28x28 pixel image

x_train[0]

The value ranges 0-255. 0 means the pixel at that point has no intensity and 255 has the highest intensity.

See the images

plt.matshow(x_train[0])
mnist handwritten digit classification using keras
y_train[0]
# 5
# Show first 5 data
y_train[:5]
# array([5, 0, 4, 1, 9], dtype=uint8)

Flatten the training data

we need to convert the two-dimensional input data into a single-dimensional format for feeding into the model. This is achieved by a process called flattening. In this process, the 28x28 grid image is converted into a single-dimensional array of 784(28x28).

x_train.shape
# (60000, 28, 28)
# Scale the data so that the values are from 0 - 1
x_train = x_train / 255
x_test = x_test / 255
x_train[0]
# Flattening the train and test data
x_train_flattened = x_train.reshape(len(x_train), 28*28)
x_test_flattened = x_test.reshape(len(x_test), 28*28)
x_train_flattened.shape
# (60000, 784)
x_train_flattened.shape
# (60000, 784)

PART 1 - Create a simple neural network in Keras

In this step, we will create the most simple, single-layer neural network using Keras.

# Sequential create a stack of layers
model = keras.Sequential([
keras.layers.Dense(10, input_shape=(784,), activation='sigmoid')
])
# Optimizer will help in backproagation to reach better global optima
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# Does the training
model.fit(x_train_flattened, y_train, epochs=5)
# OUTPUT
Epoch 1/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.4659 - accuracy: 0.8784
Epoch 2/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.3040 - accuracy: 0.9145
Epoch 3/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2828 - accuracy: 0.9206
Epoch 4/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2733 - accuracy: 0.9234
Epoch 5/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2667 - accuracy: 0.9259

After the training, I got an accuracy of around 92%, which is not bad considering we created a single-layer neural network.

Evaluate the accuracy of test data

model.evaluate(x_test_flattened, y_test)
# OUTPUT
313/313 [==============================] - 1s 1ms/step - loss: 0.2702 - accuracy: 0.9241

So, we were able to get an accuracy of 92% with the test data.

Sample prediction

We will now visualize the result by showing the image and making the prediction and validating it.

# Show the image
plt.matshow(x_test[0])
classification of mnist handwritten digit database using neural network
# Make the predictions
y_predicted = model.predict(x_test_flattened)
y_predicted[0]
array([1.8693238e-02, 2.5351633e-07, 3.8469851e-02, 9.5759392e-01,
2.0694137e-03, 1.0928032e-01, 1.0289272e-06, 9.9976790e-01,
6.6316605e-02, 6.9463903e-01], dtype=float32)
# Find the maximum value using numpy
np.argmax(y_predicted[0])
# 7
# converting y_predicted from whole numbers to integers
# so that we can use it in confusion matrix
# In short we are argmaxing the entire prediction
y_predicted_labels = [np.argmax(i) for i in y_predicted]
y_predicted_labels[:5]
# [7, 2, 1, 0, 4]

Using confusion matrix for validation

If you are confused about the confusion matrix, read this small article before proceeding - The ultimate guide to confusion matrix in machine learning

cm = tf.math.confusion_matrix(labels=y_test, predictions=y_predicted_labels)
cm
# OUTPUT
<tf.Tensor: shape=(10, 10), dtype=int32, numpy=
array([[ 965, 0, 0, 2, 0, 4, 5, 2, 2, 0],
[ 0, 1109, 3, 2, 1, 1, 4, 2, 13, 0],
[ 7, 9, 905, 27, 8, 4, 13, 10, 44, 5],
[ 3, 0, 12, 930, 0, 26, 2, 10, 16, 11],
[ 1, 1, 4, 2, 906, 0, 11, 4, 9, 44],
[ 10, 1, 1, 41, 8, 772, 14, 6, 31, 8],
[ 13, 3, 5, 2, 7, 15, 909, 2, 2, 0],
[ 1, 5, 20, 11, 7, 0, 0, 943, 2, 39],
[ 7, 7, 5, 26, 9, 22, 8, 11, 867, 12],
[ 11, 6, 1, 12, 21, 5, 0, 14, 4, 935]],
dtype=int32)>

Using seaborn to make confusion matrix look good

import seaborn as sn
plt.figure(figsize = (10,7))
sn.heatmap(cm, annot=True, fmt='d')
plt.xlabel('Predicted')
plt.ylabel('Truth')
handwritten digit recognition in python using scikit learn

The confusion matrix gives a clear picture of our prediction.

How to read the confusion matrix?

  • All the diagonal elements are correct predictions, for example, we correctly predicted the number 0, 958 times.
  • The black cells, value shows the wrong predictions. For each number n in the cell, it means that we predicted the value in the truth row as the value is the predicted column, n times. For Example, 3 was predicted as 2, 17 times.

PART 2 - Adding a hidden layer

# Sequential create a stack of layers
# Create a hidden layer with 100 neurons and relu activation
model = keras.Sequential([
keras.layers.Dense(100, input_shape=(784,), activation='relu'),
keras.layers.Dense(10, activation='sigmoid')
])
# Optimizer will help in backproagation to reach better global optima
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# Does the training
model.fit(x_train_flattened, y_train, epochs=5)
Epoch 1/5
1875/1875 [==============================] - 5s 2ms/step - loss: 0.2785 - accuracy: 0.9202
Epoch 2/5
1875/1875 [==============================] - 5s 2ms/step - loss: 0.1278 - accuracy: 0.9624
Epoch 3/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0904 - accuracy: 0.9731
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0677 - accuracy: 0.9796
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0542 - accuracy: 0.9835

Evaluate the accuracy of the test set

model.evaluate(x_test_flattened, y_test)
313/313 [==============================] - 1s 1ms/step - loss: 0.0769 - accuracy: 0.9759

Now we can observe that by adding a hidden layer the accuracy increased from 92% to 97%.

Using confusion matrix for validation

y_predicted = model.predict(x_test_flattened)
y_predicted_labels = [np.argmax(i) for i in y_predicted]
cm = tf.math.confusion_matrix(labels=y_test, predictions=y_predicted_labels)
import seaborn as sn
plt.figure(figsize = (10,7))
sn.heatmap(cm, annot=True, fmt='d')
plt.xlabel('Predicted')
plt.ylabel('Truth')
mnist confusion matrix

Compared to the previous confusion matrix the wrong predictions has gone down. We can see that the diagonal values have increased and the values in black cells have gone down. There are more '0' valued black cells, meaning correct predictions.

Bonus Content

flattening out data each time is really tedious, don't worry keras got you covered. Just use the keras.layers.Flatten like the example below

# Flattening data using keras Flatten class
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28,28)),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(10, activation='sigmoid')
])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
model.fit(x_train_flattened, y_train, epochs=5)
Epoch 1/5
1875/1875 [==============================] - 5s 2ms/step - loss: 0.2693 - accuracy: 0.9243
Epoch 2/5
1875/1875 [==============================] - 5s 2ms/step - loss: 0.1230 - accuracy: 0.9637
Epoch 3/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0851 - accuracy: 0.9747
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0644 - accuracy: 0.9803
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0508 - accuracy: 0.9846

Next step

Try playing around with different activation functions, optimizers, loss functions and epochs to optimize the model. In case of doubt ping me on Twitter

Conclusion

In this article, I discussed how to tackle the MNIST Digit Recognition problem by creating a simple Neural Network.

As a next step, I will do the same problem using Convoluted Neural Network(CNN), to read that as soon as it drops, please follow me on Twitter.

Thanks again for reading, have a nice day.

💡 UPDATE : Mnist handwritten digit classification using CNN

Learn about building products as a Data Scientist

Get a once-per-month email with my latest article and additional details about my launches, products, and experiments ✨

No spam, sales, or ads. Unsubscribe as your heart desires.