Ask-A-Data-Scientist---What-Is-Machine-Learning-.png-1

Ask a Data Scientist: What is Machine Learning?

02/16/2018

For as popular as the term “machine learning” has come to be, it’s surprising how often it’s equated to robots taking over the world.

Phrases like “neural nets” and “deep learning” tap into our sense of fantasy, but when we jump from new tech to robot takeover, we miss the beauty and power of what machine learning actually is, and the groundbreaking new developments that are pushing industries forward.

Below, Codecademy Senior Curriculum Developer Nitya Mandyam sheds light on what machine learning really is, how it works, and how we use it today.

What is machine learning?

“Machine learning is an umbrella term for a set of algorithmic tools that analyze and extract patterns from data,” Nitya says. “It’s most commonly used for decision-making.”

Unlike traditional software development, which involves giving a computer a set of rules that it uses to execute tasks, machine learning involves teaching the computer to recognize patterns in a set of data. This means you can give your computer huge data sets, and it can make predictions for you.

For instance, we’ve trained computers to accurately predict letters and numbers, the base logic for handwriting recognition used by the postal service. The same logic is used in the development of self-driving cars and social media algorithms that surface content related to your interests.

Machine learning is a subfield of artificial intelligence and computer science, and it’s also closely related to statistics. “Machine learning does what statistics can do much faster,” Nitya says. “It leverages advanced computational technology to come to faster conclusions about data.”

The origins of the field date back to the 20th century, when computer scientist Arthur Samuel popularized the term “machine learning” after teaching a computer how to play checkers. But looking at Google Trends data, you can see that machine learning has become much more popular over the past decade.

Google trends data depicting the rise of machine learning's popularity

But why now? We’ve always had sci-fi movies that reference machine learning, from 2001: A Space Odyssey to Ex Machina. What’s the reason for the recent spike in popularity, and is this just a fad or the future?

The simple answer is: We can finally actually do many of the cool things that were once relegated to sci-fi movies. And we’ll only continue to build on these technological advancements for the foreseeable future.

Machine learning has become much more popular over the last few years because of the huge amounts of data we generate every day and the fact that we can now process this data quicker and easier than ever.

“We’re at this cusp of human history where almost every behavioral pattern is documented, depending on your online habits,” Nitya says. “This opens up huge possibilities for leveraging this data.”

What is machine learning used for?

“Machine learning is excellent for making quick, predictive, analytic judgments about data,” Nitya says.

Thanks to the large amounts of data we create every day, machine learning has a wide range of applications. It’s used across various industries, from customer service and marketing to finance, entertainment, healthcare, and transportation. You’ve probably already seen machine learning in action when using popular technologies like:

  • Recommender systems: Streaming services like Spotify and Netflix use machine learning to analyze your tastes and preferences and provide personalized suggestions.
  • Virtual assistants: Virtual assistants like Siri and Alexa use machine learning (along with other technologies like natural language processing) to interpret and execute your voice commands.
  • Self-driving cars: Machine learning is used to teach autonomous vehicles to recognize objects (like traffic signs, people, and other vehicles) and make safe decisions while driving.

But while these technologies feel like the future arriving, remember that machine learning is really just powerful math and prediction. Almost as soon as someone realizes what machine learning can do, they want to ask the crystal ball a question:

  • What’s going to be the next big programming language?
  • Who’s going to win the next election?
  • Can you accurately predict our revenue if we create this new product?

But crowding around your data scientist’s desk isn’t going to help you. Applying machine learning to your business requires huge data sets that aren’t always accessible, but even if they are, it’s key that that data is in a format that a machine can read.

“Raw data always has issues,” Nitya says. “Sometimes there’s unwanted information or too much information or things aren’t encoded properly, so you need to standardize your data before it can be inputted into your model.”

How machine learning works

Machine learning requires three sets of data: a training set, a test set, and a validation set.

  1. A training set consists of baseline data that’s used to teach a computer to identify patterns, calculate parameters, and construct machine learning models.
  2. A test set is a dataset that’s used to evaluate and fine-tune the model.
  3. A validation set is a dataset that’s used to check your hypothesis.

To help illustrate the differences between each set, Nitya offers this example:

Say you wanted to teach a program to predict rental prices for apartments in Brooklyn. Your training set would include rental prices from different neighborhoods in Brooklyn and the various factors that influence it — like size, location, amenities, proximity to public transportation, etc.

But say you left data about Bay Ridge, a neighborhood in Brooklyn, out of the training set. This data could be your testing set, and you’d use it to gauge the accuracy of your model’s predictions. Then, you could use data about Queens as a validation set to assess how well your model handles new and different data.

Machine learning methods

Generally, there are four types of machine learning:

  1. Supervised learning: Supervised learning involves training a machine learning model with labeled datasets. Say you have a collection of cat and dog images. By labeling these images and feeding them to a machine learning model, you could teach it to distinguish between the two in new images.
  2. Unsupervised learning: Unsupervised learning involves training a machine learning model with unlabeled data. The model then uses techniques like principal component analysis (PCA) and k-means clustering to find patterns within it. So if you input pictures of various different kinds of animals, the program might group cats and lions into one category and dogs and wolves into another based on their similarities.
  3. Semi-supervised learning: Semi-supervised learning is a blend of both supervised and unsupervised learning, involving training a model with a small number of labeled datasets and a larger number of unlabeled datasets.
  4. Reinforcement learning: Reinforcement learning involves training a machine learning model to make autonomous decisions through trial and error — rewarding it when it performs a desirable behavior and punishing it when it performs an undesirable behavior. These rewards and punishments often come in the form of a point system, where the model either earns or loses points depending on its behavior.
    Reinforcement learning is commonly used in the development of self-driving cars. By rewarding desirable behaviors like taking a proper left turn and punishing undesirable behaviors like running a red light, you can teach it to properly respond to its environment.

The need for diversity in machine learning

According to Nitya, it’s critical to have diverse teams working on machine learning algorithms.

You need to feed the computer a full range of features and possibilities for your algorithm to work in the real world. Otherwise, you risk allowing biases to creep into your programs, which can harm the people that use them.

As an example, Nitya points to computer scientist Timnit Gebru, who wrote a paper highlighting the issues that arise when facial recognition software is trained primarily with Caucasian faces.

“The need for people of different communities — especially historically marginalized or oppressed communities — to be a part of this process is extremely important because technology has a huge impact on how we shape our future,” Nitya says.

How to get started with machine learning

So should you learn machine learning?

According to Nitya, if it’s interesting to you, then it can only be beneficial to learn how to leverage machine learning. “Learning to code and how to think about algorithms is super useful,” she says. “It helps you fight misinformation and the misuse of technology — and it’ll no longer seem as opaque and magical as it might seem to you now.”

Nitya predicts that machine learning will become a crucial skill for technical roles over the next decade — and research suggests the same, with machine learning engineers ranking highly in top jobs lists from LinkedIn and Glassdoor.

Ready to get started with machine learning? Your first step is learning Python. The programming language is used as the basis for many machine learning algorithms, and it’s powerful, easy for beginners and has well-supported documentation.

If you’re exploring machine learning, curious about your own ability to code, or even prepping for a course, you can take our free Python course. It will give you baseline skills while opening up the magical world of code. There are also Python books that will help teach you how to use the language for machine learning.

After mastering the basics of Python, check out our machine learning courses like:

You can also try our Data Scientist: Machine Learning Specialist and Machine Learning/AI Engineer career paths. Both paths will equip you with all the skills you’ll need to get started with machine learning and show you how to use them to build a portfolio that can help you land a job in the field.


Machine Learning Courses & Tutorials | Codecademy
Machine Learning is an increasingly hot field of data science dedicated to enabling computers to learn from data. From spam filtering in social networks to computer vision for self-driving cars, the potential applications of Machine Learning are vast.

Related articles

7 articles