We know how to set colors in matplotlib using the color parameter for each graph, but how do we use color to advance an argument?

The right color choices will:

  • make the visualization easier to understand
  • draw attention to the graph’s key takeaway(s)
  • add emphasis, not distraction

While matplotlib has limited built-in ability to assign colors based on the data, there’s a relatively simple workaround for applying color in a data-driven way. By adding a color_label column to the dataframe, we can programmatically and flexibly add color to a matplotlib graph. Adding a column to a dataframe is outside the scope of this course, however, it can be done…

  • manually by directly editing a spreadsheet in Excel, Google Sheets, etc.
  • programmatically using pandas

(If you’re curious to see how we set up the dataframe for this exercise using pandas, check out the commented code in our data_manipulation Jupyter notebook.)

As we’ll see in the Jupyter notebook, some tree types (the top 5 in each kind of forest) will have a corresponding “emphasis” color assigned to them. All the others will be colored lightgray.

Why? Using a different color for every tree genus would be just as hard to understand as leaving them all gray. Instead, we’ll use color strategically.

Once this is done, we can simply set plt.bar()’s color parameter equal to the color_label column of our dataframe. On to the notebook to try it out!



To start, run the Setup cells. You’ll notice a new column, color_label, in PF_data. Any genus that appears in the top 5 by count for any forest type is assigned a color. Then, run the cell below to load the graph we made in the last exercise.


Assign the color parameter for each bar graph using the color_label column of its respective dataframe.

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?