For quantitative variables, we often want to describe the central tendency, or the “typical” value of a variable. For example, what is the typical cost of rent in New York City?
There are several common measures of central tendency:
- Mean: The average value of the variable, calculated as the sum of all values divided by the number of values.
- Median: The middle value of the variable when sorted.
- Mode: The most frequent value of the variable.
- Trimmed mean: The mean excluding x percent of the lowest and highest data points.
rentals DataFrame with a column named
rent that contains rental prices, we can calculate the central tendency statistics listed above as follows:
# Mean rentals.rent.mean() # Median rentals.rent.median() # Mode rentals.rent.mode() # Trimmed mean from scipy.stats import trim_mean trim_mean(rentals.rent, proportiontocut=0.1) # trim extreme 10%
Using the same
movies DataFrame from the last exercise, find the mean
production_budget for all movies and save it to a variable called
mean_budget to see the result.
Save the median budget to a variable called
med_budget and print the result.
Save the mode to a variable called
mode_budget and print the result.
How do the mean, median, and mode of movie budgets compare to each other?
Find the mean of the budget after removing 20% of the lowest and highest data points. Save the trimmed mean to a variable called
trmean_budget and print the result.
How does trimming the most extreme data points affect the mean budget?