If your happy place is getting lost inside the pages of a Microsoft Excel workbook, there’s a programming language that you’ll probably get a kick out of: Python. Considered one of the most popular programming languages out there, Python is used for everything from web development to machine learning, and of course, data science.
While there are advantages to using both Excel and Python, “Python is just a little more robust,” says Ada Morse, Codecademy Curriculum Developer in Data Science. The new free Codecademy course Getting Started with Python for Data Science will walk you through how to use Python and Pandas, a library specifically for data manipulation and analysis, to explore, clean, and transform real datasets.
Never coded before? Don’t be intimidated by Python. This course is designed for beginners in mind, and Python has a concise, English-like syntax that reads like a natural (or spoken) language. Here are a few scenarios when you’d want to use Python over a no-code tool like Microsoft Excel, and exactly what you need to start learning the popular programming language.
You’re working with a lot of data.
It might seem like you could add an infinite number of cells to an Excel spreadsheet but there is actually a limit to the number of rows and columns it can hold — 1,048,576 rows and 16,384 columns, to be exact. “Once you’ve got a bigger data set, the advantage of being able to scroll through your data in Excel no longer really makes sense,” Ada says. “The speed of Excel becomes a problem.”
With Python, you can easily work with a very large dataset without sacrificing performance. The Python library PySpark is designed specifically for working with “big data,” which is defined as any data that is too big for a typical modern computer to process and analyze. You can learn more about how to use PySpark in our course Introduction to Big Data with PySpark.
Data scientists often work with lots of different types of data from different sources. While Excel can manage data from multiple sources, Python has libraries that allow you to easily access and process data from lots of other sources. “In a modern data landscape at a company where you’ve got cloud databases, data lakes, and all this sort of stuff, the packages with Python are just a little bit more robust,” Ada says.
The Python library BeautifulSoup, for example, is used to extract data from a website so you can put it into a Python structure called a DataFrame. We’ll show you how to do this in our course Learn Web Scraping with BeautifulSoup.
You’re doing advanced data analysis.
As you move toward more advanced data analytics, you need a tool that can execute sophisticated functions, Ada explains. Excel is a solid entry-level choice for crunching numbers and managing data, but there are hundreds of thousands of Python libraries and packages that can level-up how you analyze, visualize, and understand data. For example, the Python library NumPy can perform numerical operations on large quantities of data. Another library MatPlotLib can be used to generate elegant and interactive data visualizations.
Since Python is so easy to learn and simple to read, you can start mastering more complicated concepts quicker. In the course Getting Started with Python for Data Science, you’ll get to use Pandas and work with real datasets to sort, clean, and analyze data. You can take a closer look at these libraries with the courses Learn Data Analysis with Pandas and Learn Statistics with NumPy. Be sure to explore all of Codecademy’s Python courses — if you already know how to code, you can jump right in with the free course Python for Programmers.
You’d like to incorporate machine learning.
Machine learning is a subset of data science that’s all about teaching a computer to make predictions on its own by picking up on patterns within data. Everything from your social media feed to your smart home appliance relies on machine learning technology.
It’s possible for an Excel super-user to get good enough at using the software to incorporate machine learning and predictions, but it’s much more straightforward with Python. There are a variety of machine learning libraries for Python that you can use to prepare and clean data, choose models to use on the data, and then generate recommendations based on patterns. Some common Python libraries that Machine Learning Engineers use are Tensorflow, sci-kit image, and PyTorch.
Curious how you can become a Machine Learning Engineer? The Codecademy career path Data Scientist: Machine Learning Specialist will teach you everything you need to know to be job-ready. In this path, you’ll start by learning the basics of Python (you don’t need any experience to get started) and go deep into building neural networks with the language.
When should you use Microsoft Excel?
To be clear: Microsoft Excel is by no means outdated or obsolete, and there are still times when it’s more convenient to use Excel. For example, if you’re working on a very quick project or you need to collaborate on a spreadsheet with several people who may not understand Python or how to code.
The biggest benefit of using Excel is that it’s a “one-stop shop,” Ada says. “All of your data is stored there, and you can create your calculations and visualizations in the same sheet.” If you want to get better at using all of Microsoft Excel’s features, try our free course Analyze Data with Microsoft Excel.
Understanding which data science tools to deploy for a particular project is part of being a Data Analyst. In the new Codecademy career path Business Intelligence Data Analyst you’ll get comfortable using all the tools of the trade, including Excel, Tableau, and SQL.
Start learning how to use Python for data science
These are just some of the reasons why you should learn Python if you want to work with data. By the end of the free course Getting Started with Python for Data Science, you’ll be able to use Python to explore and summarize a dataset, filter data to find specific categories, and format raw data so you can answer a question. And once you get a taste of what you can do with Python, you’ll want to check out all of our Python courses in machine learning, web development, and lots more.
Getting Started with Python for Data Science is also a great way to get introduced to coding. Throughout the course, you’ll pick up fundamental coding principles that will come up again as you learn other programming languages. Once you know one programming language, it’s easier to learn another language because there are so many similarities — and the good news is, whichever language you choose, there’s probably a Codecademy course that will guide you.