"People have been using data and trying to learn from data for a really long time," says Sophie, a Curriculum Developer here at Codecademy. Data science is used for predicting outcomes, understanding trends over time, visualizing relationships and patterns, and generally turning data into information of value.
But what exactly is data science? In our interview with Sophie, she sheds some light on this question. Check out the video interview below and keep reading for more insights into the world of data.
What is data science and why does it matter?
Think of data science as a mashup of probability and statistics, software engineering, and domain knowledge. A data scientist's super-power is the ability to take large amounts of information and turn it into something actionable and interpretable.
Over the past decade, as data collection has become more ubiquitous and computers have become more powerful, many companies across different industries have recognized the importance of leveraging data. While data science departments may look very different at different companies, all data scientists need a strong understanding of statistics, coding skills, and communication skills.
If you’ve seen self-driving cars in action or received song suggestions from Spotify, then you have been exposed to the fruits of data science!
Taking a closer look at data science jobs
Depending on the industry, there are a lot of different tasks that a data scientist might work on, though there are similarities in the required skills. Here are just a few of the types of things data scientists do on the job:
- Market research: Some jobs in data science involve market research, which means understanding and responding to consumer behavior. After all, one of the most important parts of a data scientist's job is figuring out the right questions to ask.
- A/B testing: Data scientists may plan and conduct A/B tests, comparing multiple versions of the same web page or application to see which one converts more visitors into customers. A/B testing is an effective way to align business goals with user preferences.
- Prediction: Data scientists may build and tune machine learning models to make predictions or find patterns in data. For example, a data scientist could look at the speed and accuracy of test takers completing an online test to find behavior patterns associated with cheating.
- Managing databases: Data scientists are often responsible for working with databases to ensure that data can be accessed and understood by their colleagues. Many data scientists use SQL to extract data from databases so that they can use it for analyses or model building.
- Turning insights into recommendations: By using data to gain insights into areas of marketing, research, and development, data scientists often inform other departments on what they need to know and how to perform their jobs more effectively. They can also use data to help companies learn more about their customers and provide personalized recommendations for similar content and products. Check out our Build a Recommender System skill path to learn how.
What you do on the job as a data scientist differs from company to company and industry to industry. Other factors include what team you were assigned to and the age of the company.
People who perform the job functions described above may also have titles besides "data scientist". For example, you might see titles like data analyst, systems analyst, business intelligence analyst, statistician, and even machine learning engineer. Since the role of a data scientist is still evolving, companies may be referring to similar positions with very different names. So if you’re applying for jobs in data science, dig into the job description and be sure to ask lots of questions during the interview stage.
Want to know more about what a data scientist does? Check out our interview with data scientist Catherine Zhou. During her career, she has held several titles, including engineer and quantitative researcher. She offers some insight into working with stakeholders and teams using quantitative data to achieve business goals.
What are the most popular languages for data science?
“The tools that people use change really quickly,” says Sophie. “Personally I think the most important thing is to understand the theory behind what you’re doing. If you understand the idea of what you want to do, then you can learn different tools.” It’s important to get familiar with the abstract ideas before you move onto the actual programming.
That said, there are a few programming languages that are currently prevalent in the world of data science:
- Python: A versatile, general-purpose programming language, with many libraries for data science, including NumPy, pandas, SciPy, and matplotlib. This language is a classic favorite among many programmers because the code is easy to maintain and readable in English keyword format.
- R: This programming language was made for statistical computing. Because it was built for this purpose, the data structures and variable types in R are easy to use for data manipulation and analysis. While the syntax is slightly different, R and Python can do many of the same things!
- SQL: Most companies use a database system to organize and store large quantities of data. That’s where SQL comes in: often, a data scientist will extract data from a database using SQL and then import that data into R or Python for analysis.
“Data is everywhere. There are so many ways that we create data every single day without realizing it, and there are so many questions out there,” says Sophie. If the idea of being able to find data to collect answers to the questions you want to answer excites you, then a career in data science may be for you!