Classifying on the MNIST dataset

Renier Meyer
5 min readJun 7, 2021

Building a deep learning classifier using Python and TensorFlow — Part 1: The data

Photo by Drew Graham on Unsplash

Since I started my machine learning journey not so long ago, I was immediately intrigued by the potential that Neural Networks had for solving complex problems and my curiosity led me to try to discover as many different applications as I could. Leaving no stone unturned, I quickly found that image recognition was perhaps the most exciting for a relative newcomer like myself and has arguably been the most widely used application of neural networks and artificial intelligence in modern society.

As soon as I had a firm grasp on the fundamental concepts of machine learning, I jumped right into the world of deep learning and it’s esoteric intricacies. It was then that I discovered TensorFlow, which allowed me to get intimately acquainted with the neural networks that I so admired.

Handwriting classification on the MNIST dataset was one of the first problems that I encountered and this series of articles will take you through the detailed process of building your own classifier using the well-known TensorFlow 2.0 library in Python.

Let’s jump in!

Before we start…

Deep learning is a complex field of study with many technical terms and concepts that can be daunting at first. I have selected a few articles to get you started on the basics and on your way to becoming comfortable with the various concepts.

  • Neural network basics

“Artificial neural networks are inspired by the neurons in living organisms. Although we don’t know precisely and in every detail how biological neurons work, the basic principle behind them is easy to understand.”

  • Tensors

“A tensor consists of a set of primitive values shaped into an array of any number of dimensions. A tensor’s rank is its number of dimensions, while its shape is a tuple of integers specifying the array’s length along each dimension.”

  • Activation functions

“Simply put, an activation function is a function that is added into an artificial neural network in order to help the network learn complex patterns in the data”

  • Optimizers

“Optimizers play a key role during model training to help the model learn better. This is done by finding the optimal set of parameters, such as the nodal weights and biases, in order to produce the best possible outputs for the problem at hand.”

  • Loss functions

“Machines learn by means of a loss function. It’s a method of evaluating how well specific algorithm models the given data. If a prediction deviates too much from actual results, the loss function would cough up a very large number. Gradually, with the help of some optimization function, the loss function learns to reduce the error in prediction.”

  • The depth and width of a neural network (number of layers and nodes)

“Artificial neural networks have two main hyperparameters that control the architecture or topology of the network: the number of layers and the number of nodes in each hidden layer. You must specify values for these parameters when configuring your network. The most reliable way to configure these hyperparameters for your specific predictive modeling problem is via systematic experimentation with a robust test harness.”

Now that we have the basics down, we can look at the architecture and steps involved in building a neural network for image classification.

Basic architecture

Neural networks might be complex, but the basic architecture that we will be following when creating our model is not. The majority of models will follow the same basic process:

  1. Prepare the data — this includes splitting it into training, testing and validation sets
  2. Determine the number of inputs, outputs, hidden layers and activation functions
  3. Set the optimizer and loss function
A neural network with two hidden layers

Step 1: The dataset

The MNIST (Modified National Institute of Standards and Technology database) was created by Yan LeClun and this dataset is essentially the “Hello World!” of machine learning. It’s very beginner friendly — The dataset is sufficiently large and has already been pre-processed, meaning that we can put all of our focusing into creating the most accurate model possible and less on prepping the data first. The visual nature of the dataset also makes it easier to inspect and understand.

The dataset consists of more than 70 000 images of handwritten digits between 0 and 9 (thus, we will have 10 possible classes to predict). Each image in the dataset is 28 x 28 (784 pixels) and all images are grayscale, meaning that our pixel intensities will fall somewhere between 0 (black) and 255 (white), representing the 256 shades of grey — Who knew there were more than 50?

Extract from the MNIST dataset showing a 7 (left) and sample data (right)

Now that we have a good understanding of the data that we will be working with, we can get into coding.

Firstly, let’s import the relevant packages. Since we will be working with Tensors (matrixes), we will be using NumPy for matrix manipulation. We also import the TensorFlow library.

# Since we will be working with tensors, we will import numpyimport numpy as np 
import tensorflow as tf

We can also import the MNIST dataset directly from TensorFlow as one of the default datasets that is available in the TensorFlow package.

import tensorflow_datasets as tfds# To load the MNIST set
mnist = tfds.load(name='mnist', as_supervised=True)

If you are loading the dataset for the first time, the data will be saved to C:\Users\<your user name>\tensorflow_datasets for Windows users or under the Home directory in ~/tensorflow_datasets/ for Linux users. Any subsequent imports of this dataset will reference the saved data in the file paths listed above.

Now that we have our data, we can move onto the next part of the data preparation phase — splitting the dataset into the training, test and validation sets.

Conclusion

In this article, we outlined our problem objective and discussed the data that we will be working with when building our classifier. In the next article, we will be covering the detailed process of splitting our dataset into the training, test and validations sets, as well as discuss shuffling and batching techniques. Stay tuned!

--

--

Renier Meyer

An industrial engineer with a passion for data and software engineering.