Bar plots helped us view aggregated numeric data for different groups. What if we want to explore numeric data that are not aggregated? A scatter plot will help view data points for two numeric columns at the same time.
sns.scatterplot() function will create a scatter plot of two variables. As usual, we can set
data to the DataFrame name and
y to the columns we want along the x- and y-axis, respectively.
For example, using our restaurant dataset
df from the previous exercise, we can plot
daily_customers using the following code.
sns.scatterplot(data=df, x='daily_customers', y='sales_totals')
Scatterplots are a helpful chart type for statistical analysis. Patterns in the points can help us quickly identify if we may want to explore the relationship with further statistical analysis. These patterns allow us to visualize the strength and direction of a relationship between two variables. Visual patterns we look for in scatter plots include:
- Spacing: Points that are close together in a line or curve pattern show a stronger relationship. Points that are spaced out or more cloud-like show a weaker relationship.
- Orientation: A pattern of points starting in the lower left corner and following up to the upper right corner shows a positive relationship. A negative relationship might appear as a pattern of points starting in the upper left corner and following down to the lower right corner.
A tightly-spaced line of points suggests a strong correlation between the variables that may be positive, negative, or neither. Curved patterns or patterns in spacing may mean a more complex relationship between the variables.
Run the first two cells of the notebook to view the head of the dataset named
waste. Create a scatter plot of per capita municipal solid waste (
msw) on the y-axis and per capita GDP (
gdp) on the x-axis.
The pattern in the plot seems to show that solid waste is positively correlated with GDP, at least until a GDP of 60,000. Let’s see if per capita electronic waste (
e_waste) shows a similar pattern with
gdp. Create a scatter plot with
e_waste on the y-axis and
gdp on the x-axis.
There might be some correlation in the electronic waste plot, but the points become much more scattered around a
gdp value of 40,000.
Now let’s look at a plot that doesn’t show much of a linear pattern in the points. Create a scatter plot that has the percentage of plastic waste (
plastic) on the y-axis and the percentage of metal waste (
metal) on the x-axis.