PyTorch .softmax()

object2410038751's avatar
Published Feb 15, 2026

The .softmax() function applies the Softmax mathematical transformation to an input tensor. It is a critical operation in deep learning, particularly for multi-class classification tasks. Softmax converts a vector of raw scores (often called logits) into a probability distribution where each value represents the likelihood of a specific class.

The Softmax function for an element $x_i$ in a vector $x$ is defined as:

$$\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_{j} \exp(x_j)}$$

By exponentiating the inputs, the function ensures all outputs are non-negative. Dividing by the sum of these exponentials ensures that the resulting values sum to exactly 1.

  • Learn to build machine learning models with Python.
    • Includes 10 Courses
    • With Certificate
    • Beginner Friendly.
      23 hours
  • Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.
    • With Certificate
    • Beginner Friendly.
      24 hours

Syntax

torch.softmax(input, dim, dtype=None)

Parameters:

  • input: The input tensor containing the raw scores (logits).
  • dim: A dimension along which Softmax will be computed. Every slice along the dim will sum to 1.
  • dtype (Optional): The desired data type of the returned tensor.

Return value:

Returns a tensor of the same shape as input, with values scaled between 0 and 1.

Example 1: Basic Softmax on a 1D Tensor

This example shows how to convert a simple 1D tensor of logits into probabilities:

import torch
# A 1D tensor of raw scores
logits = torch.tensor([1.0, 2.0, 3.0])
# Apply softmax along the only dimension (0)
probabilities = torch.softmax(logits, dim=0)
print("Logits:", logits)
print("Probabilities:", probabilities)
print("Sum of probabilities:", probabilities.sum().item())

Here is the output:

Logits: tensor([1., 2., 3.])
Probabilities: tensor([0.0900, 0.2447, 0.6652])
Sum of probabilities: 1.0

The function converts raw logits into probabilities where the highest input value (3.0) yields the highest probability (~0.66), and the sum of all probabilities equals 1.0.

Example 2: Softmax on a 2D Tensor

For batched inputs where rows represent samples and columns represent classes, Softmax is typically applied along the class dimension:

import torch
# A 2D tensor (2 samples, 3 classes)
logits = torch.tensor([
[2.0, 1.0, 0.1],
[1.0, 3.0, 0.2]
])
# Apply softmax along the class dimension (dim=1)
probs = torch.softmax(logits, dim=1)
print("Probabilities:\n", probs)
print("\nSum of each row:", probs.sum(dim=1))

Here is the output:

Probabilities:
tensor([[0.6590, 0.2424, 0.0986],
[0.1131, 0.8360, 0.0508]])
Sum of each row: tensor([1.0000, 1.0000])

By specifying dim=1, the operation is applied independently to each row (sample), ensuring that the class probabilities for each individual sample sum to 1.0.

All contributors

Learn PyTorch on Codecademy

  • Learn to build machine learning models with Python.
    • Includes 10 Courses
    • With Certificate
    • Beginner Friendly.
      23 hours
  • Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.
    • With Certificate
    • Beginner Friendly.
      24 hours