Codecademy Logo

Introduction to Neural Network Architectures

Related learning

  • Learn neural network architectures with PyTorch to build deep learning models for image, text, and sequential data tasks.
    • With Certificate
    • Intermediate.
      2 hours

Multi-Layer Neural Networks

Multi-layer Neural Networks consist of an input layer, several hidden layers, and an output layer.

Each node in a hidden layer is essentially a Perceptron. Each one

  • computes a weighted sum using the inputs and weights from the nodes in the prior layer
  • (optionally) applies an activation function to the weighted sum
  • sends the activated weighted sum as an input to the nodes of the next layer
A multi-layer neural network. There are two nodes in the input layer, labeled as size in square feet and 1. Both input nodes are connected by arrows to three nodes in a second vertical layer, labeled Hidden ReLU Layer. All three of those nodes are connected by arrows to two nodes in a third vertical layer, also labeled Hidden ReLU layer. The last two nodes are connected to a single output node, labeled Linear Output Layer.

Python CNN Basics

Convolutional Neural Networks (CNNs) excel at image tasks through specialized layers:

  • Convolutional layers: extract spatial features using filters
  • Pooling layers: reduce dimensionality and parameter count
  • Fully connected layers: perform classification based on extracted features
  • Compared to standard neural networks, CNNs require fewer parameters and capture spatial relationships between pixels.

CNNs are the backbone for many vision applications like image classification.

import torch.nn as nn
import torch.nn.functional as F
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
# Convolutional layer: 3 input channels, 12 filters, 3x3 kernel
self.conv1 = nn.Conv2d(3, 12, kernel_size=3, padding=1)
# Fully connected layers
self.fc1 = nn.Linear(12 * 16 * 16, 64)
self.fc2 = nn.Linear(64, 10) # 10 output classes
def forward(self, x):
# Apply convolution and ReLU activation
x = F.relu(self.conv1(x))
# Apply max pooling (2x2)
x = F.max_pool2d(x, 2)
# Flatten for fully connected layer
x = x.view(x.size(0), -1)
# Pass through fully connected layers
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x

Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are a type of deep learning model used to model sequential data using a recurrence connection that connects individual units that can process information from the full sequence.

The key component of an RNN is a hidden state that attempts to retain information from multiple, previous units which is used to predict the next output of the sequence.

Model Profiling Essentials

Optimizing a model’s deployment involves measuring parameters such as parameter count, inference latency, and GPU memory consumption. Understanding these metrics helps tailor model performance to meet specific environment requirements.

# Simulated example of model profiling in Python
model_params = 123456 # Hypothetical number of parameters
inference_latency = 0.005 # Latency in seconds
gpu_memory = 2048 # Memory usage in MB
print(f"Model Parameters: {model_params}")
print(f"Inference Latency: {inference_latency} seconds")
print(f"GPU Memory Usage: {gpu_memory} MB")

Preprocessing in Python

Understand the specific preprocessing steps for different neural network architectures. CNNs require pixel normalization, transformers use tokenization with special tokens, and RNNs need sequence formatting. Each method prepares data uniquely for optimal model performance.

# CNN preprocessing: Normalize pixel values
image = ... # Some image data
normalized_image = image / 255.0
# Transformer preprocessing: Tokenization
from transformers import BertTokenizer
text = "A sample text for input."
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
input_tokens = tokenizer.encode_plus(text, add_special_tokens=True)
# RNN preprocessing: Sequence formatting
sequence = [1, 2, 3, 4]
formatted_sequence = nn.utils.rnn.pad_sequence([torch.tensor(seq) for seq in sequence], batch_first=True)

ReLU and Friends

Activation functions, like ReLU, GELU, and Swish, are key to making a neural network model complex behaviors. They add non-linearity, enabling networks to learn intricate patterns. Try experimenting with them to see different learning outcomes.

import numpy as np
# Define ReLU function
def relu(x):
return np.maximum(0, x)
# Define GELU function
def gelu(x):
return 0.5 * x * (1 + np.tanh(np.sqrt(2/np.pi) * (x + 0.044715 * x**3)))
# Define Swish function
def swish(x):
return x / (1 + np.exp(-x))
# Sample input
data = np.array([-1, 0, 1, 2])
print("ReLU:", relu(data))
print("GELU:", gelu(data))
print("Swish:", swish(data))

Batch Normalization in Python

Normalization layers like Batch Normalization and Layer Normalization help stabilize neural network training. They scale and shift activations, making gradients flow more smoothly and reducing internal covariate shift. This process often results in faster convergence and improved model generalization.

import torch
import torch.nn as nn
# Define a batch normalization layer with specific dimensions
batch_norm_layer = nn.BatchNorm1d(num_features=10)
# Random input tensor with 2 batches and 10 features
x = torch.randn(2, 10)
# Apply batch normalization
normalized_output = batch_norm_layer(x)
print(normalized_output)

Tokenization

Tokenization is the process of breaking down a text into individual units called tokens.

Tokenization strategies include:

  • Word-based tokenization breaks down a text into individual word-based tokens.
  • Subword-based tokenization breaks down a word into individual subword-based tokens.
  • Character-based tokenization breaks down a word into individual character-based tokens.
text = '''Vanity and pride are different things'''
# word-based tokenization
words = ['Vanity', 'and', 'pride', 'are', 'different', 'things']
# subword-based tokenization
subwords = ['Van', 'ity', 'and', 'pri', 'de', 'are', 'differ', 'ent', 'thing', 's']
# character-based tokenization
characters = ['V', 'a', 'n', 'i', 't', 'y', ' ', 'a', 'n', 'd', ' ', 'p', 'r', 'i', 'd', 'e', ' ', 'a', 'r', 'e', ' ', 'd', 'i', 'f', 'f', 'e', 'r', 'e', 'n', 't', ' ', 't', 'h', 'i', 'n', 'g', 's']

Word Embeddings

Word embeddings are key to natural language processing. Each is a real number vector representation of a specific word. Contextual information about that word is encoded within the vector numbers.

A basic English word embedding model can be loaded in Python using the spaCy library. This allows access to embeddings for English words.

nlp = spacy.load('en')

Call the model with the desired word as an argument and access the .vector attribute:

nlp('peace').vector

The result would be:

[5.2907305, -4.20267, 1.6989858, -1.422668, -1.500128, ...]

Learn more on Codecademy

  • Learn neural network architectures with PyTorch to build deep learning models for image, text, and sequential data tasks.
    • With Certificate
    • Intermediate.
      2 hours