Visualizing Time Series Data With Python
Introduction
Data represented in a single point in time is known as cross-sectional data. As a Data Scientist or Analyst, sometimes you might encounter data that is collected over periods of time, known as time series data.
Time series data shows up in the real world quite often. For example, weather readings, company stock prices, and sales data are all examples of data that can be tracked over time. Therefore, it’s important that you are able to explore and visualize data with a time component.
In this article, you will learn how to explore time series data with Python using the following:
- Line plots
- Box plots
- Heatmaps
- Lag plots
- Autocorrelation plots
Let’s get started!
Line plot
A line plot is commonly used for visualizing time series data.
In a line plot, time is usually on the x-axis and the observation values are on the y-axis. Let’s show an example of this plot using a CSV file of sales data for a small business over a five-year period.
First, let’s import several useful Python libraries and load in our data:
# import librariesimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns# load in datasales_data = pd.read_csv("sales_data.csv")# peek at first few rows of datasales_data.head()
Here are the first few rows of the sales data:
| date | sales— | — | — 0 | 2016-01-01 | 2000.0 1 | 2016-01-02 | 1700.0 2 | 2016-01-03 | 1800.0 3 | 2016-01-04 | 1400.0 4 | 2016-01-05 | 1500.0
Let’s create a line plot of the data, with date on the x-axis and sales on the y-axis:
# convert string to datetime64sales_data["date"] = sales_data["date"].apply(pd.to_datetime)sales_data.set_index("date", inplace=True)# create line plot of sales dataplt.plot(sales_data["date"], sales_data["sales"])plt.xlabel("Date")plt.ylabel("Sales (USD)")plt.show()
Notice how we can see the trend of the data over time. Looking at the chart, it seems that:
- Sales are seasonal, peaking at the beginning and end of each year, and slowing down in the middle of each year.
- Sales don’t seem to show signs of growth over time. This appears to be a stagnant business.
Write a Sales Outreach Email with Generative AI Case Study
Use generative AI to create an effective sales outreach emailTry it for freeBox plot
When working with time series data, box plots can be useful to see the distribution of values grouped by time interval.
For example, let’s create a box plot for each year of sales and put them side-to-side for comparison:
# extract year from date columnsales_data["year"] = sales_data["date"].dt.year# box plot grouped by yearsns.boxplot(data=sales_data, x="year", y="sales")plt.show()
![Side-by-side box plots for each year of sales showing that median sales are flat over time]
For each year of the sales data, we can easily see useful information such as median sales, the highest and lowest sales, the interquartile range of our data, and any outliers.
Median sales for each year (represented by the horizontal line in each box) are quite stable, suggesting that sales are not growing over time.
Heatmap
We can also use a heatmap to compare observations between time intervals in time series data.
For example, let’s create a density heatmap with year on the y-axis and month on the x-axis. This can be done by invoking the heatmap()
function of the sns
Seaborn object:
# calculate total sales for each monthsales = sales_data.groupby(["year", "month"]).sum()# re-format the data for the heat-mapsales_month_year = sales.reset_index().pivot(index="year", columns="month", values="sales")# create heatmapsns.heatmap(sales_month_year, cbar_kws={"label": "Total Sales"})plt.title("Sales Over Time")plt.xlabel("Month")plt.ylabel("Year")plt.show()
![heat map of total sales with month on the x-axis and year on the y-axis]
Recall that in a heatmap, as the color gets brighter and moves from dark purple to yellow, the total sales in the corresponding cell is higher.
Here, we see once again that the sales are pretty consistent year after year and also exhibit seasonality.
Lag scatter plot
We can use a lag scatter plot to explore the relationship between an observation and a lag of that observation.
In a time series, a lag is a previous observation:
- The observation at a previous time step (the smallest time interval for which we have distinct measurements) is called lag 1.
- The observation at two times ago is called lag 2, etc.
In the sales dataset, we have a different sales value for each day. Therefore, the lag 1 value for any particular day is equal to the sales on the previous day. The lag 2 value is the sales two days ago, etc.
The plotting
module of the [pandas] library has a built-in lag_plot
function that plots the observation at time t on the x-axis and the lag 1 observation (t+1) on the y-axis:
# import lag_plot functionfrom pandas.plotting import lag_plot# lag scatter plotlag_plot(sales_data)plt.show()
![Example of a lag scatter plot using sales data]
How can we interpret a lag scatter plot?
- If the points move from the bottom left to the top right, this indicates a positive correlation between observations and their lag 1 values. For example, high sales on one day are associated with high sales on the previous day.
- If the points move from the top left to the bottom right, this indicates a negative correlation between observations and their lag 1 values. For example, high sales on one day are associated with low sales on the previous day and vice versa.
- If there is no identifiable structure in the lag plot, this indicates the data is random, and there is no association between values at consecutive time points. For example, sales on one day tell you no information about expected sales on the following day.
Exploring the relationship between an observation and a lag of that observation is useful for helping us determine whether a dataset is random.
Since the points in the sales data move along a diagonal line from the bottom left to the top right, this indicates that our data is not random and there is a positive correlation between observations and their lag 1 values.
Autocorrelation plot
An autocorrelation plot is used to show whether the elements of a time series are positively correlated, negatively correlated, or independent of each other.
This can be plotted with the autocorrelation_plot()
function of the pandas.plotting
module:
# import autocorrelation functionfrom pandas.plotting import autocorrelation_plot# autocorrelation plotautocorrelation_plot(sales_data)plt.show()
![Autocorrelation plot for sales data with lag on the x-axis and the value of the autocorrelation on the y-axis]
In the autocorrelation plot above, lag is on the x-axis and the value of the autocorrelation, which ranges from -1 to 1, is on the y-axis. A value near 0 indicates a weak correlation while values closer to -1 and 1 indicate a strong correlation.
Notice how the autocorrelation plot for the sales data forms waves, oscillating between strong negative and positive correlation. These waves suggest that our dataset exhibits seasonality.
Also, notice how the autocorrelation decreases over time. This indicates that sales tend to be similar on consecutive days, but sales from three years ago are less associated with today’s sales than sales from one year ago.
Review
In this article, you got a brief introduction to exploring and visualizing time series data using:
- Line plots
- Box plots
- Heatmaps
- Lag plots
- Autocorrelation plots
As a Data Scientist or Analyst, you will often work with data that changes over time. Moving forward, you are now better equipped to explore and visualize data with a time component.
Author
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
Crafting Sales Emails with ChatGPT: A Comprehensive Guide for Entrepreneurs
Increase your revenue with better sales emails from ChatGPT! - Article
Exploratory Data Analysis with Data Visualization
Explore how to use data visualization techniques with Seaborn and Matplotlib for Exploratory Data Analysis (EDA). Learn to analyze datasets with univariate, bivariate, and multivariate visualizations to uncover patterns and insights. - Article
EDA Prior to Unsupervised Clustering
Learn the EDA steps that can be helpful prior to creating an unsupervised clustering model.
Learn more on Codecademy
- Free course
Write a Sales Outreach Email with Generative AI Case Study
Use generative AI to create an effective sales outreach emailBeginner Friendly< 1 hour - Career path
Data Scientist: Analytics Specialist
Data Analysts and Analytics Data Scientists use Python and SQL to query, analyze, and visualize data — and communicate findings.Includes 22 CoursesWith Professional CertificationBeginner Friendly70 hours - Skill path
Code Foundations
Start your programming journey with an introduction to the world of code and basic concepts.Includes 5 CoursesWith CertificateBeginner Friendly4 hours