Articles

RNN PyTorch Time Series Tutorial: Complete Guide to Implementation

  • Learn how to use PyTorch to build, train, and test artificial neural networks in this course.
    • Intermediate.
      3 hours
  • Learn how to use Python to build text generation models based on neural networks like RNNs and LSTMs in this PyTorch tutorial.
    • With Certificate
    • Intermediate.
      3 hours

What are Recurrent Neural Networks (RNN)?

Recurrent Neural Networks (RNNs) are specialized classes of neural networks designed to handle sequential or time-dependent data. Unlike traditional feedforward neural networks that process inputs independently, RNNs introduce a concept of memory by using loops that allow information to persist across time steps. This helps them learn from previous inputs in a sequence, making them ideal for problems where context and order matter.

In a feedforward network, inputs and outputs are considered independent. However, many real-world scenarios, like predicting the next word in a sentence or forecasting tomorrow’s temperature, depend on understanding what came before. RNNs solve this by maintaining a hidden state, which is updated at every step based on both the current input and the previous hidden state.

A diagram of a Recurrent Neural Network showing labeled input, hidden (h₁, h₂, h₃), and output layers with recurrence arrows

This makes RNNs especially effective for:

  • Stock price prediction based on historical data
  • Speech recognition, where each sound depends on the preceding ones
  • Weather forecasting using sequences of past climate data
  • Natural Language Processing (NLP) tasks like language modeling and translation

RNNs are powerful because they can model patterns across time, which traditional feedforward networks can’t achieve effectively.

Now that we understand RNNs and what makes them special, let’s look at the kind of data they’re designed to work with: time-based data.

What is time-based data?

Time-based (or sequential) data refers to values collected over time where the order is crucial, and each data point can be influenced by those before it. This structure is common in many real-world applications where context builds over time, and understanding past trends is essential for predicting future behavior.

The image here showcases typical use cases of time-based data.

An image illustration showing weather forecasting, stock price analysis, and IoT sensor readings as examples of time-based data

This type of data poses challenges such as temporal dependencies, autocorrelation, trends, and irregular intervals. Traditional neural networks can’t handle these patterns well, which is why RNNs are designed to process and learn from sequences effectively.

Now that we understand the nature and complexity of time-based data, let’s build an RNN that learns from this kind of data using PyTorch.

Implementing a basic RNN in PyTorch

We’ll use a synthetic sine wave dataset to keep things easy to visualize. The goal is to predict future points in the sequence based on past values.

Step 1: Import the required libraries

Start by importing all necessary libraries.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.optim as optim

These libraries used here are:

  • numpy and pandas help with data handling.

  • matplotlib is used for visualization.

  • MinMaxScaler scales data between 0 and 1, which helps with training.

  • torch is the core PyTorch library for building models.

Step 2: Load the dataset

For simplicity, we’ll create a synthetic sine wave dataset to simulate time-based data such as temperature, stock prices, or sensor readings.

# Generate a sine wave
time_steps = np.linspace(0, 100, 500) # 500 points between 0 and 100
data = np.sin(time_steps) # Create a sine wave
# Convert to DataFrame
df = pd.DataFrame(data, columns=['value'])

This synthetic dataset mimics a real-world periodic signal.

Step 3: Plot the raw data

Before modeling, it’s good practice to visualize the data and observe any patterns or trends.

plt.figure(figsize=(10, 4))
plt.plot(df['value'])
plt.title("Sine Wave")
plt.xlabel("Time Step")
plt.ylabel("Value")
plt.show()

The output of the code so far will be:

An image illustrating a repeating sine wave pattern over time

The sine wave shows a repeating pattern over time. This pattern is what our RNN will attempt to learn and predict.

Step 4: Preprocess the data

Neural networks work best with normalized inputs, so we scale the values between 0 and 1 using MinMaxScaler.

scaler = MinMaxScaler()
df['value'] = scaler.fit_transform(df[['value']]) # Normalize values
data = df['value'].values # Convert to numpy array for sequence creation

Scaling prevents large values from dominating the learning process and helps the model converge faster during training.

Step 5: Create sequence and labels

RNNs need sequential input. We split the time series into overlapping sequences (inputs) and the next value (label) to predict.

def create_sequences(data, seq_length):
xs, ys = [], []
for i in range(len(data) - seq_length):
x = data[i:i+seq_length] # Sequence of length `seq_length`
y = data[i+seq_length] # Label is the next value
xs.append(x)
ys.append(y)
return np.array(xs), np.array(ys)
SEQ_LENGTH = 20 # Number of past time steps to look at
X, y = create_sequences(data, SEQ_LENGTH)
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
# Convert arrays to PyTorch tensors
X_train = torch.Tensor(X_train).unsqueeze(-1) # Shape: (batch, seq, input_size)
y_train = torch.Tensor(y_train)
X_test = torch.Tensor(X_test).unsqueeze(-1)
y_test = torch.Tensor(y_test)

This step prepares the data in the format that an RNN expects, sequences of inputs and corresponding next-step outputs.

Step 6: Define the RNN model using nn.RNN

We define a basic RNN using PyTorch’s built-in nn.RNN module. The final output is passed through a fully connected (Linear) layer to produce a prediction.

class BasicRNN(nn.Module):
def __init__(self, input_size=1, hidden_size=64, num_layers=1):
super(BasicRNN, self).init()
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, 1)
def forward(self, x):
out, _ = self.rnn(x) # RNN output for all time steps
out = out[:, -1, :] # Take output from the last time step
return self.fc(out) # Pass through linear layer
# Instantiate the model
model = BasicRNN()

Here:

  • input_size = 1 because we have one feature (value per time step).

  • hidden_size is the size of the hidden state.

  • batch_first=True ensures input is in the format (batch, seq_len, input_size).

Step 7: Train the model

We train the model using Mean Squared Error (MSE) loss and Adam optimizer. This loop runs over multiple epochs, updating weights to reduce prediction errors.

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
EPOCHS = 100
losses = []
for epoch in range(EPOCHS):
model.train()
output = model(X_train)
loss = criterion(output.squeeze(), y_train) # Compute loss
optimizer.zero_grad()
loss.backward()
optimizer.step()
losses.append(loss.item())
if (epoch + 1) % 10 == 0:
print(f"Epoch [{epoch+1}/{EPOCHS}], Loss: {loss.item():.4f}")

The model learns by comparing its output to the actual value (loss), adjusting its weights, and minimizing errors over time.

Step 8: Evaluate and forecast

We use the trained model to make predictions on the test data, then reverse the scaling to visualize the results.

model.eval()
with torch.no_grad():
predictions = model(X_test).squeeze().numpy() # Predict on test set
# Inverse transform to original scale
y_test_inv = scaler.inverse_transform(y_test.reshape(-1, 1))
predictions_inv = scaler.inverse_transform(predictions.reshape(-1, 1))
# Plot actual vs predicted
plt.figure(figsize=(10, 4))
plt.plot(y_test_inv, label='Actual')
plt.plot(predictions_inv, label='Predicted')
plt.title("RNN Prediction vs Actual")
plt.xlabel("Time Step")
plt.ylabel("Value")
plt.legend()
plt.show()

We compare the predicted values to the actual values. A good model will have predictions that closely follow the actual curve.

The following code produces a sample output as follows:

Epoch [10/100], Loss: 0.0417
Epoch [20/100], Loss: 0.0979
Epoch [30/100], Loss: 0.0161
Epoch [40/100], Loss: 0.0079
Epoch [50/100], Loss: 0.0049
Epoch [60/100], Loss: 0.0020
Epoch [70/100], Loss: 0.0011
Epoch [80/100], Loss: 0.0005
Epoch [90/100], Loss: 0.0003
Epoch [100/100], Loss: 0.0002

Line plot comparing actual sine wave values to predicted values using a trained RNN model in PyTorch. The two lines closely overlap, indicating strong prediction accuracy

These logs show the training progress of the RNN model over 100 epochs. The Loss value represents the Mean Squared Error (MSE) between the predicted and actual values on the training set.

Let’s analyze the pattern:

  • Epoch 10, Loss: 0.0417 – At this point, the model is still learning basic patterns.

  • Epoch 30, Loss: 0.0161 – The model begins to capture trends better, with a significant drop in error.

  • Epoch 50, Loss: 0.0049 – The learning accelerates, and the model starts producing closer predictions.

  • Epoch 70, Loss: 0.0011 – Error is now very low, indicating accurate predictions.

  • Epoch 100, Loss: 0.0002 – The model has nearly mastered the training data, showing excellent fit without signs of overfitting.

The steadily decreasing loss shows that the RNN effectively learned the patterns in the time-based sine wave data.

But what’s actually happening inside an RNN when it processes a sequence? Let’s break down how it works behind the scenes.

How do RNNs work?

Traditional neural networks process inputs in isolation, but RNNs process data step by step, retaining memory of what came before. This makes them ideal for tasks where context matters, such as language translation or stock forecasting.

The hidden state is what allows the RNN to carry information from one step to the next.

RNN architecture:

At the core of an RNN is the concept of a loop. Here’s how data flows:

  • Input: At each time step, the network receives a new input (e.g., a temperature reading or a word).

  • Hidden state: The input is combined with the hidden state from the previous time step. This hidden state acts like a memory that carries information.

  • Output: The updated hidden state is then used to generate the output for that time step.

This loop structure can be visualized like this:

Xt → ht → Ot
↑ ↓

Where:

  • Xt is the input at time t

  • ht is the hidden state at time t

  • Ot is the output at time t

The recurrent connection ensures that information from previous steps influences the current step output, unlike feedforward networks that treat all inputs as unrelated.

Training with backpropagation through time (BPTT)

During training, RNNs use Backpropagation Through Time (BPTT) to update weights. This involves unrolling the network across time steps and calculating gradients for each one. These gradients help the model learn how much past information to remember or forget.

However, this mechanism introduces some serious limitations as follows:

  • Vanishing gradients: When backpropagating across many time steps, gradients can shrink too much to make useful updates. The network “forgets” early inputs.

  • Exploding gradients: Conversely, gradients can also grow too large, making training unstable.

  • Short memory: RNNs typically struggle to retain information from far back in the sequence. This makes them less suitable for tasks requiring long-term context.

Once an RNN model is trained, visualizing its performance is key to understanding how well it captures patterns in the data.

Visualizing the results of RNN

Training a model isn’t just about running the code, it’s about understanding how well it’s learning. Visualization helps us spot trends, identify errors, and know whether our RNN is actually learning the patterns in the data.

  • Training loss vs epochs: To see how well the model minimizes error during training.

  • Predicted vs actual values: To evaluate the model’s accuracy on unseen (test) data.

Each visualization gives us a unique lens to evaluate the strengths and limitations of our RNN.

Plot training loss vs. epochs

This plot helps us evaluate whether the model is learning effectively. Ideally, the loss should decrease steadily as the model improves during training.

# Plot the loss at each epoch to visualize model convergence
plt.figure(figsize=(8, 4))
plt.plot(range(1, 100 + 1), losses, label='Training Loss', color='purple')
plt.xlabel('Epochs')
plt.ylabel('MSE Loss')
plt.title('Training Loss vs Epochs')
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()

Here:

  • num_epochs = 100: We trained our model for 100 epochs.

  • losses: This list holds the loss after each epoch. Plotting this shows whether our model is learning effectively.

  • A smooth downward curve is a sign that the model is converging.

  • If the curve plateaus early or fluctuates too much, it may indicate:

    • The learning rate is too high
    • The model is too simple for the problem
    • The data isn’t properly preprocessed

Line graph showing model training loss decreasing steadily over 100 epochs

Plot predicted values vs. actual values

This comparison shows how close the model’s predictions are to real data.

# Plot predictions vs actuals
plt.figure(figsize=(10, 4))
plt.plot(y_test_inv, label='Actual', color='blue')
plt.plot(predictions_inv, label='Predicted', color='orange')
plt.xlabel('Time Steps')
plt.ylabel('Scaled Value')
plt.title('Predicted vs Actual Values')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Here:

  • We’re plotting test data (y_test_inv) against the model’s forecast (predictions_inv).

  • If the orange line (predicted) closely follows the blue line (actual), the model has learned the patterns well.

  • Divergence between the two indicates underfitting or noise in the dataset.

Overlayed line graphs showing actual vs predicted values for time series forecasting using RNN

But what if your data has long-term dependencies, delays, or patterns that span far back in time? This is where advanced architectures like LSTM and GRU outperform basic RNNs. Let’s find out how and when to make that switch.

RNN vs LSTM/GRU

While RNNs are great for capturing short-term patterns, they often struggle with longer sequences due to memory limitations. This is where LSTM and GRU networks come in, offering better control over what information is remembered or forgotten.

To help you decide, here’s a quick comparison:

Feature RNN LSTM GRU
Complexity Simple architecture More complex (more gates) Slightly simpler than LSTM
Training time Faster Slower than RNN Faster than LSTM
Memory handling Short-term only Handles long-term dependencies Also handles long-term
Vanishing gradient Prone Less likely Less likely
Number of gates None 3 (input, forget, output) 2 (reset, update)
Performance Basic High for complex patterns Competitive with LSTM

When to choose what?

  • Use RNNs when your dataset contains short, simple sequences, and speed is a higher priority than long-range accuracy.

  • Use LSTM or GRU when you’re dealing with longer sequences, such as language modeling, speech recognition, or any task where context over time is essential.

Conclusion

Recurrent Neural Networks (RNNs) are ideal for modeling sequential or time-based data. In this article, we explored what RNNs are, how they function, and why they’re useful for tasks like stock prediction and weather forecasting. You also learned how to build an RNN in PyTorch, visualize its results, and decide when to switch to more advanced models like LSTM or GRU.

If you’d like to continue learning about neural networks and PyTorch, check out Codecademy’s Build Deep Learning Models with PyTorch course.

Frequently asked questions

1. Which layer type is commonly used in RNNs for time series prediction tasks?

The nn.RNN layer in PyTorch is commonly used, along with alternatives like nn.LSTM and nn.GRU for handling longer dependencies in time series data.

2. What is the best optimizer for RNNs?

Adam is often considered the best default optimizer for RNNs due to its adaptive learning rate and good performance across tasks. SGD can also be used, especially with learning rate scheduling.

3. Are RNNs more powerful than CNNs?

Not necessarily. RNNs are better for sequential or time-dependent data, while CNNs excel at spatial data like images. Each has strengths depending on the problem.

4. What is the difference between RNN and CNN for time series?

RNNs model time step by time step, preserving order and temporal dependencies. CNNs can process time series by learning local patterns, but they don’t inherently remember past data like RNNs.

5. Why does LSTM perform better than RNN?

LSTMs have gating mechanisms that allow them to retain and forget information selectively, helping them capture long-term dependencies better than standard RNNs, which struggle with vanishing gradients.

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team

Learn more on Codecademy

  • Learn how to use PyTorch to build, train, and test artificial neural networks in this course.
    • Intermediate.
      3 hours
  • Learn how to use Python to build text generation models based on neural networks like RNNs and LSTMs in this PyTorch tutorial.
    • With Certificate
    • Intermediate.
      3 hours
  • Learn to build neural networks and deep neural networks for tabular data, text, and images with PyTorch.
    • Includes 8 Courses
    • With Certificate
    • Intermediate.
      17 hours