Articles

Building a Neural Network Model Using TensorFlow

Published Mar 12, 2025Updated Mar 17, 2025
Learn how to build a neural network model in TensorFlow by creating a digits classification model using the MNIST dataset.

Traditional machine-learning algorithms like Random Forest and XGBoost are very effective in building applications using structured tabular data. However, they fall short when dealing with complex tasks like image recognition, text classification, and speech recognition. For instance, if we want to create an application for recognizing handwritten digits from images, the application needs to learn complex, non-linear patterns in the dataset for identifying the digits, which goes beyond the capabilities of simple machine learning models. For these cases, we can build neural network models using TensorFlow.

Let’s discuss how to build a neural network model using TensorFlow by building a numeric digit recognition application using the MNIST dataset.

What is TensorFlow?

TensorFlow is an end-to-end deep learning framework developed by Google. It is an open-source framework that we can use in Python, Java, C++, and JavaScript programming languages. TensorFlow supports all the stages of deep learning application development.

Installing TensorFlow

We will use Python 3 and TensorFlow 2.x to build neural network models. If you don’t have Python installed on your machine, you can install Python 3 first.

To install and use TensorFlow in Python, use the PIP package installer and execute the following command in the command-line shell on your machine.

pip install tensorflow

If TensorFlow is already installed on your machine, you will get a message saying “Requirement already satisfied” when you execute the command.

To check which version of TensorFlow is installed on your machine, execute the following command in the command-line terminal.

python3 -c 'import tensorflow as tf; print(tf.__version__)'

We will use TensorFlow 2 to implement neural networks. If the TensorFlow framework installed on your machine has version 1.x, upgrade the TensorFlow version using the following command.

pip install tensorflow --upgrade

After executing this command, TensorFlow will upgrade to the latest version compatible with Python installed on your machine. With Python and TensorFlow installed, we can now build neural network models using TensorFlow. For this task, we will use the MNIST dataset.

Before moving ahead, make sure you understand the basics of neural networks.

Related Course

Intro to Deep Learning with TensorFlow

Build basic deep learning models in TensorFlow.Try it for free

Loading and pre-processing the MNIST dataset

The MNIST dataset contains 60,000 training and 10,000 testing images of handwritten digits. We can download and read the MNIST dataset using the load_data() function in the tensorflow.keras.datasets.mnist module. The load_data() function returns the MNIST dataset as two tuples containing train and test data, respectively.

We can use the load_data() function to read the train and test data into x_train, y_train, x_test, and y_test variables.

import tensorflow as tf
mnist_dataset = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) =mnist_dataset.load_data()
print("The shape of x_train:",x_train.shape)
print("The shape of y_train:",y_train.shape)
print("The shape of x_test:",x_test.shape)
print("The shape of y_test:",y_test.shape)

Output:

The shape of x_train: (60000, 28, 28)
The shape of y_train: (60000,)
The shape of x_test: (10000, 28, 28)
The shape of y_test: (10000,)

In the output,

  • The x_train and x_test variables contain arrays of shape (60000, 28, 28) and (10000, 28, 28) respectively. Here, 60,000 and 10,000 is the number of records in the training set and testing set data, and 28x28 is the shape of the input images.
  • The y_train variable contains the digit values for all the images in x_train. Similarly, the y_test variable contains the digit values for all the images in x_test.

To see the arrays representing the images and the associated digit values, we can visualize the arrays using the matplotlib module in Python.

For instance, we can visualize the array of the image at index 0 in the x_train variable using the imshow() function in the matplotlib.pyplot module, as shown in the following example:

import matplotlib.pyplot as plt
plt.imshow(x_train[0])

Output:

Image of Array at Index 0

To see the associated digit to the image at index 0 in x_train, we can access the value at index 0 in the y_train variable:

print("The digit value at y_train[0] : \n" , y_train[0])

Output:

The digit value at y_train[0] :
5

The arrays representing the images have values ranging from 0 to 255. Each value in the array is a pixel value that represents the pixel’s brightness at the given position.

We will scale the pixel values in the training and test data to train the neural network model for image recognition. For this, we will divide all the pixel values by the maximum pixel value, i.e., 255:

x_train, x_test = x_train / 255.0, x_test / 255.0

To help the neural network model identify the digits, we will convert the training task into a multiclass classification task, where each input array representing an image will be classified as representing digits from 0 to 9.

For the classification task, we will convert the digit values in y_train and y_test into vectors of binary values that represent a specific class. As we have 10 unique digits, we will convert each digit value into a binary vector of length 10. For any binary vector, only one value will be set to 1, and the other nine will be set to 0. The index at which the value is set to 1 determines the digit represented by the vector. For example, the vector [1, 0, 0, 0, 0, 0, 0, 0, 0, 0] represents the digit 0 and the vector [0, 0, 0, 0, 0, 1, 0, 0, 0, 0] represents the digit 5.

To convert the numeric digits to binary vectors, we will use an identity matrix of shape 10. In an identity matrix, all the elements except the diagonal elements are 0. Hence, the array in the first row has only the first element as 1. The second row has only the second element as 1, and so on. To convert a digit d to its binary vector representation, we can get the row at the index d in the identity matrix.

We will use the numpy.eye() function to create an identity matrix and write a function that converts a digit value to its binary vector representation.

import numpy as np
def to_binary_vector(d):
num_classes=10
indentity_matrix=np.eye(num_classes)
return indentity_matrix[d]
print("The binary vector representation of 5 is:",to_binary_vector(5))

Output:

The binary vector representation of 5 is: [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]

Now, to convert y_train and y_test to arrays of binary vectors, we can pass them to the to_binary_vector() function:

y_train_encoded = to_binary_vector(y_train)
y_test_encoded = to_binary_vector(y_test)
print("The digit value for y_train[0]:",y_train[0])
print("The binary vector for y_train[0]:",y_train_encoded[0])

Output:

The digit value for y_train[0]: 5
The binary vector for y_train[0]: [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]

Instead of using the numpy.eye() function, we can use the to_categorical() function defined in the tensorflow.keras.utils module to generate binary vectors for the digit values.

The to_categorical() function takes the array containing the numerical digit values, i.e., y_train or y_test as its first input argument and the number of unique digits in the dataset as the input to the num_classes parameter. After execution, it returns a 2D array containing the binary vector representation of the values in y_train and y_test.

from tensorflow.keras.utils import to_categorical
y_train_encoded = to_categorical(y_train, num_classes= 10)
y_test_encoded = to_categorical(y_test, num_classes= 10)
print("The digit value for y_train[0]:",y_train[0])
print("The binary vector for y_train[0]:",y_train_encoded[0])

Output:

The digit value for y_train[0]: 5
The binary vector for y_train[0]: [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]

We have scaled training data and converted the digit values to binary vectors representing class labels. Now, we will use the TensorFlow module to define the neural network architecture for the digit identification task.

Building neural network architecture using TensorFlow

To build the neural network model, we need to define the input, hidden, and output layers separately and then combine them.

Defining the input layer

The input layer of a neural network reads the input data in its original shape an converts the input into a 1D array. For example, we have image inputs of size 28x28 in the MNIST dataset so the input layer of the neural network model will take inputs in the shape (28,28) and flatten them.

We will use the Flatten() function defined in the tensorflow.keras.layers module to define the input layer. The Flatten() function takes the shape of the input data as an input to its input_shape parameter and returns a neural network layer.

from tensorflow.keras.layers import Flatten
input_layer= Flatten(input_shape=(28, 28))

Defining the hidden layers

To define the hidden layers of the neural network, we will use dense layers. Dense layers are also called fully connected layers. This is because every neuron in a dense layer receives input from every neuron in the previous layer and processes the data.

We will use the Dense() function defined in the tensorflow.keras.layers module to define the hidden layers. The Dense() function takes the number of neurons required in the layer as its first input argument and the activation function name as input to its activation parameter. We will define two hidden layers with 128 and 64 neurons, respectively. In these layers, we will use the ReLU activation function, as shown below:

from tensorflow.keras.layers import Dense
first_hidden_layer=Dense(128, activation='relu')
second_hidden_layer=Dense(64, activation='relu')

Defining the output layer

We will use a dense layer to define the neural network model’s output layer. Here, we will create a layer with ten neurons, as there are ten classes in the digit classification problem that we have defined to train the digit identification model. Also, we will use the SoftMax activation function, as this is the most suitable activation function for output layers for multiclass classification problems.

output_layer= Dense(10, activation='softmax')

Assembling all the layers together to create a neural network

After defining the input, hidden, and output layers, we need to assemble them to create the neural network model. To do this, we will arrange the layers in a sequence where the output of one layer is directly fed as the input to the next layer in the sequence input layer -> first hidden layer -> second hidden layer -> output.

To configure the layers sequentially, we will use the Sequential() function defined in the tensorflow.keras.models module. The Sequential() function takes a list of neural network layers as its input argument and returns a neural network. Here, the input list should contain the neural network layers in the same order as the defined neural network architecture.

nn_model = tf.keras.models.Sequential([input_layer, first_hidden_layer, second_hidden_layer,output_layer])

After defining the model architecture, we can get the information about the shape and type of each layer in the neural network model using the summary() method.

print(nn_model.summary())

Output:

Model: "sequential_8"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flatten_8 (Flatten) │ (None, 784) │ 0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_26 (Dense) │ (None, 128) │ 100,480 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_27 (Dense) │ (None, 64) │ 8,256 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_28 (Dense) │ (None, 10) │ 650 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 109,386 (427.29 KB)
Trainable params: 109,386 (427.29 KB)
Non-trainable params: 0 (0.00 B)

In the output, notice the model has four layers, and the type, shape, and number of parameters in each layer are also mentioned. The above neural network has 109,386 parameters that we will train using the MNIST dataset. To train these parameters, we need to define the optimizer, loss function, and activation function in the neural network model. To do this, we need to compile the model.

Compile the TensorFlow neural network model

While compiling the neural network model, we will use the adam optimizer due to its adaptive learning rate and faster convergence. For the loss function, we will use categorical_crossentropy as we are training the model for a multiclass classification problem. Also, we will use accuracy and categorical_crossentropy in the metrics.

To compile the model, we will use the compile() method. The compile() method takes the optimizer, loss function, and metric names as input and compiles the neural network model.

nn_model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy','categorical_crossentropy'])

After compilation, the neural network model is ready for training. So, let’s train the model.

Train the neural network model using the MNIST dataset

To train a neural network model, we need to specify the epochs, batch size, and validation split.

  • Epochs: An epoch refers to one complete pass of the entire training dataset through the model. Training a model for multiple epochs allows it to learn patterns more effectively by repeatedly seeing the whole dataset. For example, our training set has 60,000 images. Hence, if we set the number of epochs to 10 while training the neural network model, each image will pass through the model ten times.
  • Batch size: The batch size is used to define the number of samples from the training dataset that are processed together before updating weights and biases in the neurons. Dividing the data into batches allows us to train the model more efficiently in steps.
  • Validation split: The neural network needs to evaluate the model performance during model training. We need to set aside some of the training data for this. The validation dataset is the fraction of the training data set aside to evaluate the model’s performance during training. This data is not used for training but for checking how well the model generalizes to unseen data. We can define the validation split to specify the fraction of training data that can be used as a validation dataset.

For training the model, we will use the fit() method. The fit() method, when invoked on an untrained neural network model, takes the following inputs:

  • The training data and the output labels are the first and second inputs, respectively.
  • The number of epochs used to train the model is defined using the epochs parameter.
  • The batch size of the input data is defined using the batch_size parameter.
  • The fraction of validation data is defined using the validation_split parameter.

We can train the compiled neural network model using the training dataset and the fit() method.

nn_model.fit(x_train,y_train_encoded, epochs=10, batch_size=32, validation_split=0.2)

After executing the fit() method, we get the trained neural network model.

Evaluate the performance of the trained neural network model

Once we have trained the model, we can use the test dataset to evaluate the performance of the neural network model using the evaluate() method. The evaluate() method, when invoked on a trained model, takes test data and associated labels and returns the loss and metrics defined while training the model.

test_loss, test_accuracy, test_categorical_crossentropy =nn_model.evaluate(x_test, y_test_encoded)
print("The test loss is:",test_loss)
print("The test accuracy is:",test_accuracy)
print("The test categorical cross entropy is:",test_categorical_crossentropy)

Output:

The test loss is: 0.11425400525331497
The test accuracy is: 0.9731000065803528
The test categorical cross entropy is: 0.11425400525331497

In the output returned by the validate() function, the loss is always the first value followed by the metrics in the same order we define them while compiling the neural network model.

Predicting the outputs

After training the model, we can use it for the digit identification task. For this, we can use the predict() method. The predict() method takes a numpy array of N input images represented by the 2D arrays as its input. After execution, it returns an array of shape (N, n_classes). Here, N is the number of inputs provided to the predict() function, and n_classes is the number of classes in the classification problem. Each internal array in the output contains n_classes values denoting the probability distribution over all the classes.

To understand the model output, let’s predict the digit for the array at x_train[0]. To do this, we first need to put the 2D array at x_train[0] in a list and pass it to the numpy.array() function to create an array of shape (1,28,28). This is because the model will take inputs only in the shape of (N,28,28), where N is the number of input arrays for prediction and 28x28 is the shape of the 2D arrays representing the images.

input_array=np.array([x_train[0]])
model_output = nn_model.predict(input_array)
print("The model output for x_train[0] is:\n",model_output)

Output:

The model output for x_train[0] is:
[[1.4436237e-19 3.4252757e-18 2.8208583e-17 2.5214213e-05 3.5907194e-22
9.9997473e-01 7.4090914e-19 6.0076263e-18 7.1581166e-17 9.8946597e-12]]

This example shows that the output contains an array of shape (1,10). Here,

  • 1 is the number of input arrays provided for prediction as we only predicted for x_train[0].
  • The internal array contains 10 values in the range 0 to 1, which denote the probability of the input belonging to a particular class. For instance, the value 1.4436237e-19 at index 0 is the probability of the input array belonging to class 0 or digit 0. Similarly, 3.4252757e-18 is the probability of the input array belonging to class 1 or digit 1.
  • The index at which the probability is highest is the class to which the input value belongs. In this case, the probability value at index 5 is .99997473, which is the highest value. Hence, the input array x_train[0] belongs to class 5 or digit 5.

Manually looking at the probabilities to identify the digits can be difficult. Hence, we can write a function that gives the index at which the probability value is highest in the internal arrays of the model output. The index will be the predicted class or identified digit. To do this, we can use the numpy.argmax() function, which returns the highest value index in a numpy array.

def predict_digit_from_probability(model_output):
digits = model_output.argmax(axis=1)
return digits
digit=predict_digit_from_probability(model_output)
print("The predicted digit is:",digit)

Output:

The predicted digit is: [5]

Instead of a single input array, we can also pass multiple inputs to the predict() method and predict the digits from the images. For example, we can predict the digits for the first five images in the input data x_train.

input_array=x_train[0:5]
model_output = nn_model.predict(input_array)
digits=predict_digit_from_probability(model_output)
print("The predicted digits are:",digits)

Output:

The predicted digits are: [5 0 4 1 9]

Conclusion

Learning how to build deep learning applications using neural networks in TensorFlow enables us to solve complex real-world problems in image recognition, speech recognition, and text processing domains. By creating a digit recognition model, we discussed how to build deep learning applications using neural networks in TensorFlow.

Deep learning is an ever-evolving field, and neural networks are just the beginning. You can experiment with different datasets and model architectures to build deep learning applications that solve more complex and impactful problems. This will help you understand the use cases for various neural network layers, activation, and loss functions.

To get hands-on experience, you can take this course on building deep learning models using TensorFlow. You might also like this intro to PyTorch and neural networks course.

Happy Learning!

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team