PyTorch is a machine learning library for developing deep learning models in Python.
# import pytorchimport torch
PyTorch tensors store numerical data. They can be created from NumPy arrays using the torch.tensor()
method. All entries in a tensor have the same type, specified by the dtype
parameter.
# NumPy arraynp_array = np.array([100, 75.5])# Convert to a tensor with float dtypetorch_tensor = torch.tensor(np_array, dtype=torch.float)
A linear model consists of:
Starting with specific input values for the input features, the linear model
Linear models are generalizations of the classic equation of a line:
Here,
y
is the target outputx
is the input featurem
is the weight for x
b
is the biasA linear equation can be modeled as a neural network structure called a Perceptron that consists of:
For example, the image attached to this review card demonstrates a Perceptron modelling
Activation functions introduce non-linearities to a neural network. These allow the model to learn non-linear relationships in the dataset.
One way of thinking about activation functions is that they serve to “turn on” or “turn off” nodes, allowing the neural network to recognize specific properties of the training dataset (e.g. a particular node “turns on” under certain conditions.)
An activation function is applied by a node in a neural network after it computes the weighted sum of its inputs and bias.
PyTorch implements common activation functions like ReLU and Sigmoid in the torch.nn
module.
# import nn modulefrom torch import nn
The ReLU activation function returns 0
for negative input values, otherwise it returns nonnegative values unchanged.
For example,
ReLU(-1)
returns 0
since -1
is negativeReLU(3)
returns 3
since 3
is nonnegativeReLU can be implemented in PyTorch with nn.ReLU
from the torch.nn
module.
# by hand definition of ReLUdef ReLU(x):return max(0,x)# ReLU in PyTorchfrom torch import nnReLU = nn.ReLU()
Multi-layer Neural Networks consist of an input layer, several hidden layers, and an output layer.
Each node in a hidden layer is essentially a Perceptron. Each one
PyTorch’s nn.Sequential()
method builds neural networks by specifying layers and activation functions in sequence from input to output.
For example, the code attached to this review card defines a neural network with:
Data flows through this network in the order in which the layers and activation functions are specified.
import torch.nn as nnmodel = nn.Sequential(nn.Linear(8,16),nn.ReLU(),nn.Linear(16,10),nn.Sigmoid(),nn.Linear(10,1))
The loss function is a mathematical formula used to measure the error (also known as loss values) between the model predictions and the actual target values.
The loss function is computed after feeding data through a neural network. Then, the loss is used to compute the gradients that tell the optimization algorithm how to adjust weights and biases to improve the neural network’s performance.
The most common loss function for regression tasks is the Mean Squared Error (MSE) which is computed by
Notably, MSE emphasizes the largest differences which can be helpful but may lead to overfitting.
# Implement MSE in PyTorchimport torch.nn as nnloss = nn.MSELoss()MSE = loss(predictions, y)
When training a neural network, optimizers are algorithms used to adjust the weights and biases of the neural network. The goal of an optimizer is to decrease the loss of the network.
PyTorch implements many common optimizer algorithms in its optim
module.
from torch import optim
Gradient descent is an optimization algorithm that uses calculus to update the weights and biases of the network.
We can think of gradient descent as being on top of a mountain where our objective is to descend down. To choose a direction, we look at the slope of the mountain and move towards where the mountain slopes down the most.
These slopes are called gradients in calculus. Gradient descent computes the gradients (slopes) of the loss function and then updates the weights and biases to move the neural network “downhill”, thereby decreasing loss.
For optimization algorithms, the learning rate hyperparameter specifies how far we adjust each model parameter, like the weights and biases, at each training step.
Tradeoffs when choosing learning rate values:
The Adam optimizer is a popular variant of gradient descent that looks to iteratively minimize the loss function by adjusting the learning rate dynamically during training.
The Adam optimizer in PyTorch takes two inputs:
model.parameters()
import torch.optim as optimoptimizer = optim.Adam(model.parameters(), lr=0.01)
To apply gradient descent optimizers in PyTorch, the following steps need to be taken:
.backward()
to the loss to calculate the gradients of the loss function.step()
to the optimizer to update the weights and biases# Compute the lossMSE = loss(predictions, y)# Backward pass to calculate the gradientMSE.backward()# Use optimizer to update weights and biasesoptimizer.step()
Training a neural network is an iterative process of:
Each iteration in this loop is called an epoch.
# Training loop with 400 iterationsnum_epochs = 400for epoch in range(num_epochs):predictions = model(X) # forward passMSE = loss(predictions,y) # compute lossMSE.backward() # compute gradientsoptimizer.step() # update weights and biasesoptimizer.zero_grad() # reset the gradients
The torch.save()
function saves an entire PyTorch model to a file that includes the model architecture and learned parameters.
The torch.load()
function loads a saved PyTorch model.
# Saving PyTorch modelstorch.save(model, 'model.pth')# Loading PyTorch modelsloaded_model = torch.load('model.pth')
To evaluate a PyTorch model on a testing dataset or generate predictions on a new dataset, we need to set the model to evaluation mode using model.eval()
and turn off gradient calculations using with torch.no_grad()
.
# Set model to evaluation modemodel.eval()# Turn off gradient calculationswith torch.no_grad():# Generate predictions on testing datasetpredictions = model(X_test)test_MSE = loss(predictions, y_test)
Neural network classes can be constructed using object-oriented programming (OOP) with the PyTorch subclass nn.Module
. This requires explicitly defining the init
method to initialize the network components and the forward
method to design the forward pass.
Constructing networks as classes gives developers increased flexibility and control over pre-built sequential models using nn.Sequential
while still inheriting PyTorch’s built-in training and optimization libraries.
# Creating a Neural Network Class Using OOPclass NN_Regression(nn.Module):def __init__(self):super(NN_Regression, self).__init__()# Initialize layers and activation functions..def forward(self, x):# Define forward pass..return x# Instantiate Neural Network Classmodel = NN_Regression()