PyTorch Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is one of the most fundamental optimization algorithms for training neural networks. In PyTorch, torch.optim.SGD provides a straightforward way to implement SGD with optional parameters like momentum, weight_decay, and nesterov.
SGD updates model parameters iteratively by calculating the loss function’s gradient for each parameter and then adjusting those parameters in the opposite direction of the gradient.
Syntax
torch.optim.SGD(
params,
lr=0.01,
momentum=0,
weight_decay=0,
dampening=0,
nesterov=False
)
params: Iterable of parameters to optimize (typicallymodel.parameters()).lr: The learning rate (required).momentum: Value for momentum (default is0, meaning no momentum).weight_decay: L2 penalty (default is0).dampening: Dampening for momentum (default is0).nesterov: Enables Nesterov momentum if set toTrue(default isFalse).
Example
Below is a simple example using torch.optim.SGD to optimize a small neural network:
import torchimport torch.nn as nnimport torch.optim as optim# Sample model: a single-layer neural networkmodel = nn.Sequential(nn.Linear(10, 5),nn.ReLU(),nn.Linear(5, 1))# Loss functioncriterion = nn.MSELoss()# Optimizer: Stochastic Gradient Descentoptimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)# Dummy input and targetx = torch.randn(2, 10) # batch size = 2, input features = 10target = torch.randn(2, 1)# Forward passoutput = model(x)loss = criterion(output, target)# Backward pass and updateloss.backward()optimizer.step()print(f"Loss after one update: {loss.item():.4f}")
The above code prints the following output:
Loss after one update: 0.2851
Here is the step-by-step process used in the above example:
- Define the Model: A simple feed-forward network is created with two
Linearlayers and aReLUactivation. - Set Up Criterion:
MSELossis used in this example, but any suitable loss function can be substituted. - Initialize Optimizer: The optimizer is configured with
model.parameters(), a learning rate of0.01, and a momentum of0.9. - Forward Pass: Compute the model’s output given the input tensor.
- Compute Loss: Compare the model’s predictions with the target using MSE.
- Backward Pass: Calculate gradients through a call to
loss.backward(). - Optimize: Update parameters based on the gradients via
optimizer.step().
Running the script prints a loss value indicating how well the network performs on this single batch. In practice, multiple batches and epochs are typically used for training.
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn PyTorch on Codecademy
- Machine Learning Data Scientists solve problems at scale, make predictions, find patterns, and more! They use Python, SQL, and algorithms.
- Includes 27 Courses
- With Professional Certification
- Beginner Friendly.95 hours
- Learn how to use PyTorch to build, train, and test artificial neural networks in this course.
- Intermediate.3 hours