One way to quickly identify outliers is by sorting our data, Once our data is sorted, we can quickly glance at the beginning or end of an array to see if some values lie far beyond the expected range. We can use the NumPy function np.sort
to sort our data.
Let’s go back to our 3rd grade height example, and imagine an 8th grader walked into our experiement:
>>> heights = np.array([49.7, 46.9, 62, 47.2, 47, 48.3, 48.7])
If we use np.sort
, we can immediately identify the taller student since their height (62”) is noticeably outside the range of the dataset:
>>> np.sort(heights) array([ 46.9, 47. , 47.2, 48.3, 48.7, 49.7, 62])
Instructions
You’ve been tracking temperature data over the summer on your back porch, but realized that you placed your sensor right over a grill! Before you can use your data, you need to check to see if the heat from the grill caused any weird readings that could skew your data.
First, sort the temps
data array and save the sorted data to a sorted_temps
variable.
Now, print the sorted_temps
array. What do we see? Did the grill, in fact, create outliers in our data?