Codecademy Logo

Image Classification with PyTorch

Convolution Output Calculation

Learn to calculate output sizes in convolutional or pooling layers with the formula: O = (I - K + 2P)/S + 1, where I is input size, K is kernel size, P is padding, and S is stride. Mastery of this can prevent shape mismatch errors in neural networks.

Convolutional Layer

Input size: 32 × 32
Kernel size (K): 5
Padding (P): 2
Stride (S): 1

O = (32 - 5 + 2×2) / 1 + 1 = 32

Output size: 32 × 32

Pooling Layer Example

Input size: 32 × 32
Kernel size (K): 2
Padding (P): 0
Stride (S): 2

O = (32 - 2 + 0) / 2 + 1 = 16

Output size: 16 × 16

Pooling Layers in CNN

In Convolutional Neural Networks (CNNs), pooling layers follow convolutional layers to reduce spatial dimensions. This accelerates the model and helps prevent overfitting. Max pooling, a common technique, uses a small window to select the maximum value in each region, retaining strong activations while discarding irrelevant details.

import torch
import torch.nn as nn
# Define a simple model with MaxPooling2D
class MaxPoolModel(nn.Module):
def __init__(self):
super(MaxPoolModel, self).__init__()
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
def forward(self, x):
return self.pool(x)
# Create model and print summary-like info
model = MaxPoolModel()
# Example input: batch size 1, 1 channel, 28x28
x = torch.randn(1, 1, 28, 28)
output = model(x)
print("Input shape:", x.shape)
print("Output shape after max pooling:", output.shape) #(1,1,14,14), cut original image size in half

Python Convolutional Layers

Convolutional layers utilize filters to analyze an input image’s local patterns, such as edges. Parameters include kernel size, filters, stride, and padding, which influence feature map characteristics. Adjust these to optimize a model’s learning ability for specific tasks.

import torch
import torch.nn as nn
# Define a convolutional layer
conv_layer = nn.Conv2d(in_channels=1,
out_channels=16,
kernel_size=(3,3),
stride=1,
padding=1)
# Input tensor (example: 1x8x8 image)
input_image = torch.randn(1, 1, 8, 8)
# Process image through convolutional layer
output = conv_layer(input_image)
print(f"Output Tensor Shape: {output.shape}")

PyTorch Image Models

PyTorch provides comprehensive tools for vision tasks through libraries like torchvision:

  • Classification: assigning labels to entire images
  • Detection: locating and identifying objects with bounding boxes
  • Segmentation: pixel-level classification of image regions
from torchvision import datasets, transforms, models
# Load a pre-built model for classification
resnet = models.resnet50(pretrained=True)
# Load a dataset
cifar10 = datasets.CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor())

PyTorch DataLoaders

DataLoaders in PyTorch are essential for managing image data. They efficiently handle batching, shuffling, and transformation during training. This is crucial for optimizing model performance and ensuring variability across training epochs.

from torch.utils.data import DataLoader
# Create a DataLoader with batch size of 64
# Shuffle training data to prevent overfitting
dataloader = DataLoader(dataset,
batch_size=64,
shuffle=True)
# Usage in training loop
for images, labels in dataloader:
# Each iteration loads a batch of 64 images
outputs = model(images)
loss = criterion(outputs, labels)
# ...continue with backpropagation

Image Transformation

Image transformations standardize data for model input:

  • Resize(): converts images to uniform dimensions
  • Normalize(): scales pixel values to specific range
  • ToTensor(): converts images to PyTorch tensors

Transformations are applied sequentially and should be identical for training and testing sets (except augmentations).

from torchvision import transforms
# Create transformation pipeline
transform = transforms.Compose([
transforms.Resize((64, 64)), # Resize to 64x64 pixels
transforms.ToTensor(), # Convert to tensor, scale to [0.0, 1.0]
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # Normalize RGB channels
])
# Apply transformations when loading dataset
dataset = datasets.CIFAR10(root='./data',
train=True,
transform=transform,
download=True)

Image Augmentations

Pre-processing images using augmentations such as flipping, rotating, and color jittering can enhance model performance by providing diverse image representations. Image augmentations create diverse variants of training images to improve model generalization. Augmentations are applied only to training data, not testing/validation data. These techniques help prevent overfitting, ensuring the vision model generalizes well to new data.

  • Flipping: mirrors images horizontally/vertically
  • Rotation: changes image orientation
  • Color jittering: adjusts brightness, contrast, saturation
# Training transforms with augmentations
train_transform = transforms.Compose([
transforms.RandomHorizontalFlip(), # 50% chance of flipping horizontally
transforms.RandomRotation(15), # Rotate ±15 degrees
transforms.ColorJitter(brightness=0.2), # Adjust brightness by ±20%
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
# Testing transforms without augmentations
test_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

Python CNN Basics

Convolutional Neural Networks (CNNs) excel at image tasks through specialized layers:

  • Convolutional layers: extract spatial features using filters
  • Pooling layers: reduce dimensionality and parameter count
  • Fully connected layers: perform classification based on extracted features
  • Compared to standard neural networks, CNNs require fewer parameters and capture spatial relationships between pixels.

CNNs are the backbone for many vision applications like image classification.

import torch.nn as nn
import torch.nn.functional as F
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
# Convolutional layer: 3 input channels, 12 filters, 3x3 kernel
self.conv1 = nn.Conv2d(3, 12, kernel_size=3, padding=1)
# Fully connected layers
self.fc1 = nn.Linear(12 * 16 * 16, 64)
self.fc2 = nn.Linear(64, 10) # 10 output classes
def forward(self, x):
# Apply convolution and ReLU activation
x = F.relu(self.conv1(x))
# Apply max pooling (2x2)
x = F.max_pool2d(x, 2)
# Flatten for fully connected layer
x = x.view(x.size(0), -1)
# Pass through fully connected layers
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x

PyTorch Conv2d Basics

A convolutional layer is essential in Convolutional Neural Networks (CNNs). In PyTorch, you initialize it using nn.Conv2d. Customize your setup with the number of input nodes, filters, kernel size, and padding to tailor-fit your neural network’s needs.

import torch
import torch.nn as nn
conv_layer = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=(3, 3), padding=1)
# Example input with dimensions (batch_size=1, channels=3, height=32, width=32)
input_tensor = torch.randn(1, 3, 32, 32)
# Forward pass
output_tensor = conv_layer(input_tensor)
print(output_tensor.shape) # Expected: [1, 16, 32, 32]

Learn more on Codecademy