One of the most common ways to summarize a dataset is to communicate its center. In this lesson, we will use average and median as our measures of centrality. Take the Codecademy lessons on average and median if you’re interested in how to calculate them by hand or using NumPy functions.
The figure below shows the average and median ages of a dataset of 100 authors. As expected, the average and median values are near the center of the distribution.
While it’s good practice to communicate both the average and median values, the average is generally more common.
Using the following lines of code, we found the values for the average and median of our dataset. Replace the relevant
????????? with these values in summary.txt.
The average is 16,948:
cp_data[' Average Covered Charges '].mean()
The median is 14,659:
cp_data[' Average Covered Charges '].median()
In the following exercise, we will discuss why the median is smaller than the average. Before we move ahead, do you have any guesses?