Codecademy Logo

Advanced Neural Network Architectures

Related learning

  • AI Engineers build complex systems using foundation models, LLMs, and AI agents. You will learn how to design, build, and deploy AI systems.
    • Includes 16 Courses
    • With Certificate
    • Intermediate.
      20 hours

Self-Attention Mechanism in Transformers

Self-attention in transformers uses key, query, and value projections along with positional embeddings. This combination creates context-aware representations that capture both the relationships between tokens and the overall structure of the sequence.

import torch
import torch.nn.functional as F
B, T, C = 2, 5, 4 # Batch size, Sequence length, Embedding dimension
x = torch.randn(B, T, C) # Random input embeddings
# Positional embeddings: Simulating positional information added to x
positional_embeddings = torch.randn(B, T, C)
x = x + positional_embeddings # Adding positional information to embeddings
# Using x as the query, key, and value for simplicity
query = x
key = x
value = x
# Step 1: Calculate attention scores manually
scores = (query * key).sum(dim=-1) / torch.sqrt(torch.tensor(C, dtype=torch.float32))
# Step 2: Apply softmax to get attention weights
attention_weights = F.softmax(scores, dim=-1)
# Step 3: Compute the output by combining the value embeddings with attention weights
output = (attention_weights.unsqueeze(-1) * value).sum(dim=1)
# Output the results
print("Attention Weights:\n", attention_weights)
print("Self-Attention Output:\n", output)

Transformer Architecture

  • The original architecture of a transformer, as proposed in 2017, contains two blocks — an encoder and decoder block. Since then, transformer models have been built that are exclusively decoder-only or encoder-only as well.
  • Encoders and decoders are both neural networks that have special layers known as attention layers. These layers use something known as a “self-attention mechanism” to capture contextual information in sequences. Their primary difference is in how they implement the attention mechanism.
Image showing the primary components of a transformer - the encoder and decoder.

Hugging Face’s Transformers Library

  • The goal of the Hugging Face Transformers library is to provide a single Python API through which any transformer model can be loaded, trained, fine-tuned and saved.
  • The Hugging Face Transformers library provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio. It’s backed by the three most popular deep learning libraries – JAX, PyTorch and TensorFlow.

Types of Transformers

Transformer models can be grouped into three main categories based on their architecture:

  • Auto-encoding (or encoder-only) models like BERT that are great at sentence classification, named entity recognition and extractive question answering.
  • Auto-regressive (or decoder-only) models like GPT that are great at text generation.
  • Sequence-to-sequence (or encoder-decoder) models like BART and T5 that are suitable for summarization and translation.
Image describing the three different transformer types — auto-encoding, auto-regressive and sequence-to-sequence models.

BERT Transformer Model

BERT (Bidirectional Encoder Representations from Transformers) is an encoder-only transformer model. It excels at interpreting a token’s meaning by considering its context based on surrounding tokens, looking at both directions — left and right. This bidirectional attention allows BERT to understand the nuanced meanings of words and phrases within a sequence.

# Load a Pre-trained BERT
from transformers import BertTokenizer, BertForSequenceClassification
model_name = 'bert-base-uncased'
bert_tokenizer = BertTokenizer.from_pretrained(model_name)
pretrained_bert = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)

The .from_pretrained() method

The .from_pretrained() method can be used to load and save a pretrained transformer model. The AutoTokenizer, AutoProcessor, and AutoModel classes allow one to load tokenizers, processors and models, respectively, for any model architecture.

from transformers import AutoModel, AutoTokenizer
checkpoint = 'pretrained-model-you-want'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModel.from_pretrained(checkpoint)

Understanding Autoencoders

An autoencoder is a type of neural network used to learn efficient data codings. It has two main parts: the encoder compresses the input into a smaller latent representation, while the decoder reconstructs the original data. This process often reveals underlying structures within the data.

import torch
import torch.nn as nn
class SimpleAutoencoder(nn.Module):
def __init__(self, input_dim=784, latent_dim=32):
super().__init__()
# Encoder compresses data
self.encoder = nn.Sequential(
nn.Linear(input_dim, 128),
nn.ReLU(),
nn.Linear(128, latent_dim)
)
# Decoder reconstructs data
self.decoder = nn.Sequential(
nn.Linear(latent_dim, 128),
nn.ReLU(),
nn.Linear(128, input_dim),
nn.Sigmoid()
)
def forward(self, x):
latent = self.encoder(x) # Compress
reconstructed = self.decoder(latent) # Reconstruct
return reconstructed

CLIP Model Basics

CLIP (Contrastive Language-Image Pretraining) is a versatile model that integrates visual and textual data into a unified representation. Perfect for zero-shot image classification, it works across different tasks without needing additional training. This unique feature makes it an effective tool for various AI projects, allowing rapid adaptation to new tasks.

from transformers import CLIPProcessor, CLIPModel
from PIL import Image
# Load a pre-trained CLIP model and processor
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch16")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16")
# Prepare inputs, replace 'image.jpg' and 'text' with your image and prompt
inputs = processor(text=["a photo of a cat"], images=Image.open("image.jpg"), return_tensors="pt", padding=True)
# Get predictions
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # batch_size x 1
probs = logits_per_image.softmax(dim=1)
print(probs) # Output prediction probabilities

Learn more on Codecademy

  • AI Engineers build complex systems using foundation models, LLMs, and AI agents. You will learn how to design, build, and deploy AI systems.
    • Includes 16 Courses
    • With Certificate
    • Intermediate.
      20 hours