For quantitative variables, we often want to describe the central tendency, or the “typical” value of a variable. For example, what is the typical cost of rent in New York City?
There are several common measures of central tendency:
- Mean: The average value of the variable, calculated as the sum of all values divided by the number of values.
- Median: The middle value of the variable when sorted.
- Mode: The most frequent value of the variable.
- Trimmed mean: The mean excluding x percent of the lowest and highest data points.
For our rentals
DataFrame with a column named rent
that contains rental prices, we can calculate the central tendency statistics listed above as follows:
# Mean rentals.rent.mean() # Median rentals.rent.median() # Mode rentals.rent.mode() # Trimmed mean from scipy.stats import trim_mean trim_mean(rentals.rent, proportiontocut=0.1) # trim extreme 10%
Instructions
Using the same movies
DataFrame from the last exercise, find the mean production_budget
for all movies and save it to a variable called mean_budget
. Print mean_budget
to see the result.
Save the median budget to a variable called med_budget
and print the result.
Save the mode to a variable called mode_budget
and print the result.
How do the mean, median, and mode of movie budgets compare to each other?
Find the mean of the budget after removing 20% of the lowest and highest data points. Save the trimmed mean to a variable called trmean_budget
and print the result.
How does trimming the most extreme data points affect the mean budget?