Built-in loss functions
In PyTorch, loss functions are critical in the training process of deep learning models. They measure how well the model’s predictions match the ground truth. PyTorch provides several built-in loss functions, which can be easily integrated into your models to compute the error and optimize the parameters accordingly.
Types of built-in loss functions
PyTorch provides various loss functions, each serving a different purpose depending on the task (e.g., regression, classification) at hand.
1. Mean Squared Error Loss (MSELoss
)
For regression problems, Mean Squared Error (MSE) is one of the most commonly used loss functions. It calculates the square of the difference between predicted and actual values, averaging the result over all samples.
The syntax is as follows:
torch.nn.MSELoss(reduction='mean')
reduction (str, default='mean')
: Specifies the reduction method to apply to the output:'mean'
: The sum of the squared differences will be divided by the number of elements in the output.'sum'
: The sum of the squared differences will be computed.'none'
: No reduction will be applied, returning the loss for each element in the batch.
Here’s an example of how to use MSELoss:
import torchimport torch.nn as nnloss_fn = nn.MSELoss()predictions = torch.tensor([2.0, 3.0, 4.0])targets = torch.tensor([2.5, 3.5, 4.5])loss = loss_fn(predictions, targets)print(loss)
The output will be:
tensor(0.2500)
2. Cross-Entropy Loss (CrossEntropyLoss
)
For classification tasks, CrossEntropyLoss
measures the performance of a classification model. It compares the predicted class probabilities with the actual class labels. PyTorch combines log_softmax
and nll_loss
in this function for numerical stability.
The syntax is as follows:
torch.nn.CrossEntropyLoss(weight=None, ignore_index=-100, reduction='mean')
weight
(Tensor, optional): A manual rescaling weight given to each class. It has to be a tensor of sizeC
whereC
is the number of classes. Default isNone
, meaning no rescaling.ignore_index
(int, optional): Specifies a target value that is ignored and does not contribute to the loss calculation.reduction
(str, default=’mean’): Specifies the reduction method to apply:'mean'
: The mean loss across the batch.'sum'
: The sum of the loss across the batch.'none'
: No reduction, returns the loss for each element.
Here’s an example that shows how to use CrossEntropyLoss
:
import torchimport torch.nn as nnloss_fn = nn.CrossEntropyLoss()logits = torch.tensor([[1.0, 2.0, 3.0], [1.0, 2.0, 3.0]])labels = torch.tensor([2, 0])loss = loss_fn(logits, labels)print(loss)
The output will be:
tensor(1.4076)
3. Binary Cross-Entropy Loss (BCEWithLogitsLoss
)
The BCEWithLogitsLoss
loss function is commonly used for binary classification tasks. It calculates the cross-entropy between the target labels and predictions. This loss function is suitable for tasks where there are two possible classes, typically 0 and 1.
The syntax is as follows:
torch.nn.BCEWithLogitsLoss(weight=None, reduction='mean')
weight
(Tensor, optional): A manual rescaling weight given to each class. Default isNone
.reduction
(str, default=’mean’): Specifies thereduction
method to apply:'mean'
: The mean loss across the batch.'sum'
: The sum of the loss across the batch.'none'
: No reduction, returns the loss for each element.
Here’s an example that shows how to use BCEWithLogitsLoss
:
import torchimport torch.nn as nn# Example of BCEWithLogitsLossloss_fn = nn.BCEWithLogitsLoss()logits = torch.tensor([0.5, -1.5, 2.0])labels = torch.tensor([1.0, 0.0, 1.0])loss = loss_fn(logits, labels)print(loss)
The output will be:
tensor(0.2675)
4. Huber Loss (SmoothL1Loss
)
The Huber Loss function is less sensitive to outliers than MSELoss
. It combines the properties of MSE and Mean Absolute Error (MAE) and is less prone to large gradient changes than MSE.
The syntax is as follows:
torch.nn.SmoothL1Loss(reduction='mean')
reduction
(str, default=’mean’): Specifies thereduction
method to apply:'mean'
: The mean loss across the batch.'sum'
: The sum of the loss across the batch.'none'
: No reduction, returns the loss for each element.
Here’s an example that shows how to use SmoothL1Loss
:
import torchimport torch.nn as nn# Example of SmoothL1Lossloss_fn = nn.SmoothL1Loss()predictions = torch.tensor([2.0, 3.0, 4.0])targets = torch.tensor([2.5, 3.5, 4.5])loss = loss_fn(predictions, targets)print(loss)
The output will be:
tensor(0.1250)
5. Cosine Similarity Loss (CosineEmbeddingLoss
)
Cosine similarity measures the cosine of the angle between two vectors. This loss function is useful in tasks like measuring the similarity between two vectors, such as in information retrieval or recommendation systems.
The syntax is as follows:
torch.nn.CosineEmbeddingLoss(margin=0.0, reduction='mean')
margin
(float, default=0.0): The margin by which the cosine similarity should be greater than or less than. Iftarget
is 1, the cosine similarity should be greater than1 - margin
; iftarget
is -1, the cosine similarity should be less than-1 + margin
.reduction
(str, default=’mean’): Specifies the reduction method to apply:'mean'
: The mean loss across the batch.'sum'
: The sum of the loss across the batch.'none'
: No reduction, returns the loss for each element.
Here’s an example that shows how to use CosineEmbeddingLoss
:
import torchimport torch.nn as nnloss_fn = nn.CosineEmbeddingLoss()input1 = torch.tensor([[1.0, 0.0]])input2 = torch.tensor([[0.0, 1.0]])target = torch.tensor([1]) # Expecting inputs to be similarloss = loss_fn(input1, input2, target)print(loss)
The output will be:
tensor(1.)
6. Kullback-Leibler Divergence (KLDivLoss
)
KL Divergence measures how one probability distribution diverges from a second, expected probability distribution. It is widely used in tasks such as variational autoencoders (VAEs) and generative models.
The syntax is as follows:
torch.nn.KLDivLoss(reduction='mean')
reduction
(str, default=’mean’): Specifies thereduction
method to apply:'mean'
: The mean loss across the batch.'sum'
: The sum of the loss across the batch.'none'
: No reduction, returns the loss for each element.
Here’s an example that shows how to use KLDivLoss
:
import torchimport torch.nn as nn# Define KLDivLoss with batchmean reductionloss_fn = nn.KLDivLoss(reduction='batchmean')# Define input logits (must be log probabilities)input_probs = torch.tensor([[0.4, 0.6], [0.3, 0.7]], dtype=torch.float32)input_log_probs = input_probs.log() # Convert probabilities to log probabilities# Define target distribution (must be a valid probability distribution)target_probs = torch.tensor([[0.5, 0.5], [0.4, 0.6]], dtype=torch.float32)# Compute lossloss = loss_fn(input_log_probs, target_probs)print(loss) # Expected output: A small positive tensor value indicating divergence
The output will be:
tensor(0.0215)
Choosing the Right Loss Function
When selecting a loss function for a model, it is essential to consider the task at hand.
Here’s a quick guide to choosing the right loss function for a model:
- Regression: Use
MSELoss
orSmoothL1Loss
. - Binary Classification: Use
BCEWithLogitsLoss
. - Multi-class Classification: Use
CrossEntropyLoss
. - Measuring Similarity: Use
CosineEmbeddingLoss
. - Divergence in Probability Distributions: Use
KLDivLoss
.
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn PyTorch on Codecademy
- Free course
Intro to PyTorch and Neural Networks
Learn how to use PyTorch to build, train, and test artificial neural networks in this course.Intermediate3 hours - Course
PyTorch for Classification
Build AI classification models with PyTorch using binary and multi-label techniques.With CertificateBeginner Friendly3 hours