Self-attention in transformers uses key, query, and value projections along with positional embeddings. This combination creates context-aware representations that capture both the relationships between tokens and the overall structure of the sequence.
import torchimport torch.nn.functional as FB, T, C = 2, 5, 4 # Batch size, Sequence length, Embedding dimensionx = torch.randn(B, T, C) # Random input embeddings# Positional embeddings: Simulating positional information added to xpositional_embeddings = torch.randn(B, T, C)x = x + positional_embeddings # Adding positional information to embeddings# Using x as the query, key, and value for simplicityquery = xkey = xvalue = x# Step 1: Calculate attention scores manuallyscores = (query * key).sum(dim=-1) / torch.sqrt(torch.tensor(C, dtype=torch.float32))# Step 2: Apply softmax to get attention weightsattention_weights = F.softmax(scores, dim=-1)# Step 3: Compute the output by combining the value embeddings with attention weightsoutput = (attention_weights.unsqueeze(-1) * value).sum(dim=1)# Output the resultsprint("Attention Weights:\n", attention_weights)print("Self-Attention Output:\n", output)
Transformer models can be grouped into three main categories based on their architecture:

BERT (Bidirectional Encoder Representations from Transformers) is an encoder-only transformer model. It excels at interpreting a token’s meaning by considering its context based on surrounding tokens, looking at both directions — left and right. This bidirectional attention allows BERT to understand the nuanced meanings of words and phrases within a sequence.
# Load a Pre-trained BERTfrom transformers import BertTokenizer, BertForSequenceClassificationmodel_name = 'bert-base-uncased'bert_tokenizer = BertTokenizer.from_pretrained(model_name)pretrained_bert = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)
.from_pretrained() methodThe .from_pretrained() method can be used to load and save a pretrained transformer model. The AutoTokenizer, AutoProcessor, and AutoModel classes allow one to load tokenizers, processors and models, respectively, for any model architecture.
from transformers import AutoModel, AutoTokenizercheckpoint = 'pretrained-model-you-want'tokenizer = AutoTokenizer.from_pretrained(checkpoint)model = AutoModel.from_pretrained(checkpoint)
AutoencodersAn autoencoder is a type of neural network used to learn efficient data codings. It has two main parts: the encoder compresses the input into a smaller latent representation, while the decoder reconstructs the original data. This process often reveals underlying structures within the data.
import torchimport torch.nn as nnclass SimpleAutoencoder(nn.Module):def __init__(self, input_dim=784, latent_dim=32):super().__init__()# Encoder compresses dataself.encoder = nn.Sequential(nn.Linear(input_dim, 128),nn.ReLU(),nn.Linear(128, latent_dim))# Decoder reconstructs dataself.decoder = nn.Sequential(nn.Linear(latent_dim, 128),nn.ReLU(),nn.Linear(128, input_dim),nn.Sigmoid())def forward(self, x):latent = self.encoder(x) # Compressreconstructed = self.decoder(latent) # Reconstructreturn reconstructed
CLIP (Contrastive Language-Image Pretraining) is a versatile model that integrates visual and textual data into a unified representation. Perfect for zero-shot image classification, it works across different tasks without needing additional training. This unique feature makes it an effective tool for various AI projects, allowing rapid adaptation to new tasks.
from transformers import CLIPProcessor, CLIPModelfrom PIL import Image# Load a pre-trained CLIP model and processormodel = CLIPModel.from_pretrained("openai/clip-vit-base-patch16")processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16")# Prepare inputs, replace 'image.jpg' and 'text' with your image and promptinputs = processor(text=["a photo of a cat"], images=Image.open("image.jpg"), return_tensors="pt", padding=True)# Get predictionsoutputs = model(**inputs)logits_per_image = outputs.logits_per_image # batch_size x 1probs = logits_per_image.softmax(dim=1)print(probs) # Output prediction probabilities