One way to quickly identify outliers is by sorting our data, Once our data is sorted, we can quickly glance at the beginning or end of an array to see if some values lie far beyond the expected range. We can use the NumPy function
np.sort to sort our data.
Let’s go back to our 3rd grade height example, and imagine an 8th grader walked into our experiement:
>>> heights = np.array([49.7, 46.9, 62, 47.2, 47, 48.3, 48.7])
If we use
np.sort, we can immediately identify the taller student since their height (62”) is noticeably outside the range of the dataset:
>>> np.sort(heights) array([ 46.9, 47. , 47.2, 48.3, 48.7, 49.7, 62])
You’ve been tracking temperature data over the summer on your back porch, but realized that you placed your sensor right over a grill! Before you can use your data, you need to check to see if the heat from the grill caused any weird readings that could skew your data.
First, sort the
temps data array and save the sorted data to a
Now, print the
sorted_temps array. What do we see? Did the grill, in fact, create outliers in our data?