The spread of a quantitative variable describes the amount of variability. This is important because it provides context for measures of central tendency. For example, if there is a lot of variability in New York City rent prices, we can be less certain that the mean or median price is representative of what the typical rent is.
There are several common measures of spread:
- Range: The difference between the maximum and minimum values of a variable.
- Interquartile range (IQR): The difference between the 75th and 25th percentile values.
- Variance: The average of the squared distance from each data point to the mean.
- Standard deviation (SD): The square root of the variance.
- Mean absolute deviation (MAD): The mean absolute value of the distance between each data point and the mean.
rentals DataFrame, we can calculate the spread for the
rent column as follows:
# Range rentals.rent.max() - rentals.rent.min() # Interquartile range rentals.rent.quantile(0.75) - rentals.rent.quantile(0.25) from scipy.stats import iqr iqr(rentals.rent) # alternative way # Variance rentals.rent.var() # Standard deviation rentals.rent.std() # Mean absolute deviation rentals.rent.mad()
movies DataFrame, find the range for
production_budget and save it to a variable called
range_budget to see the result.
Save the interquartile range for budget to a variable called
iqr_budget and print the result.
Save the variance to a variable called
var_budget and print the result.
Save the standard deviation to a variable called
std_budget and print the result.
Save the mean absolute deviation to a variable called
mad_budget and print the result.