Given that we’re a “learn to code” website, this one may seem fairly obvious. But in all seriousness, programming is an essential part of data science. It’s also what sets data science apart from similar fields, like data analytics.
Programming is the practice of writing commands for a computer to execute. Computer Science is the discipline of interacting with computation systems.
A computer program is a series of instructions that tells the computer to perform a certain task. This could range from simply asking a computer to print, “Kirby has the best superpower!” to asking a computer to create a model that recommends movies based on your previous interests.
In data science, programming allows us to hand the processing power over to the computers. Given the right commands, computers can process millions of data points in a matter of seconds. In further Codecademy content, you will learn to write code that organizes and analyzes data. Furthermore, within data science, programs will allow you to reproduce experiments by simply running the program again.
You will also learn how to program models that can make predictions based on data points. These models are the basis of machine learning - a field of computer science that allows computers to make predictions based on data.
The program in the terminal clusters different data points as either
zombies based on where they fall in the data.
Clustering is a subsection of data science that allows us to classify data. Clustering is important in data science because with massive amounts of data, clustering by hand can take an extremely long time. This program will be able to cluster all the data points within seconds.
Add the following line to the bottom of the program
Run the code to see the clusters!