Often, the data you encounter in the real world won’t have flags attached and won’t provide labeled answers to your question. Finding patterns in this type of data, unlabeled data, is a common theme in many machine learning applications. Unsupervised Learning is how we find patterns and structure in these data.
Clustering is the most well-known unsupervised learning technique. It finds structure in unlabeled data by identifying similar groups, or clusters. Examples of clustering applications are:
- Recommendation engines: group products to personalize the user experience
- Search engines: group news topics and search results
- Market segmentation: group customers based on geography, demography, and behaviors
- Image segmentation: medical imaging or road scene segmentation on self-driving cars
Let’s get started!
In the visualization on the right, how many clusters (groups) do you see?
To find out the answer, add the following code at the bottom of script.py: