Pandas .describe() method

df.describe(include = 'all')

The pandas method, .describe() provides summary statistics for all features in a dataset. Setting include = 'all' includes summary statistics for both quantitative and categorical features.

Data Summaries
  1. 1
    Before diving into formal analysis with a dataset, it is often helpful to perform some initial investigations of the data through exploratory data analysis (EDA) to get a better sense of what you w…
  2. 2
    For quantitative variables, we often want to describe the central tendency, or the “typical” value of a variable. For example, what is the typical cost of rent in New York City? There are severa…
  3. 3
    The spread of a quantitative variable describes the amount of variability. This is important because it provides context for measures of central tendency. For example, if there is a lot of variab…
  4. 4
    While summary statistics are certainly helpful for exploring and quantifying a feature, we might find it hard to wrap our minds around a bunch of numbers. This is why data visualization is such a p…
  5. 5
    When it comes to categorical variables, the measures of central tendency and spread that worked for describing numeric variables, like mean and standard deviation, generally becomes unsuitable when…
  6. 6
    A counts table is one approach for exploring categorical variables, but sometimes it is useful to also look at the proportion of values in each category. For example, knowing that there are 3,539 r…
  7. 7
    For categorical variables, bar charts and pie charts are common options for visualizing the count (or proportion) of values in each category. They can also convey the relative frequencies of each c…
  8. 8
    In this lesson, you’ve learned about the common ways to summarize and visualize quantitative and categorical variables for the purpose of EDA. - We can use .describe(include=’all’) to quickly dis…

