Learn

In this lesson, you’ve learned about the common ways to summarize and visualize quantitative and categorical variables for the purpose of EDA.

  • We can use .describe(include='all') to quickly display common summary statistics for all columns in a pandas DataFrame.
  • For quantitative variables, measures of central tendency (e.g., mean, median, mode) and spread (e.g., range, variance, standard deviation) are good ways to summarize the data. Boxplots and histograms are often used for visualization.
  • For categorical variables, the relative frequencies of each category can be summarized using a table of counts or proportions. Bar charts and pie charts are often used for visualization.

Being able to use the appropriate metrics and visuals to explore the variables in your dataset can help you to draw insights from your data and prepare for more rigorous analysis and modeling down the road.

Sign up to start coding

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.
Already have an account?