Data Science

Published May 13, 2021Updated Apr 2, 2023
Contribute to Docs

Data Science is a multidisciplinary field of study that utilizes data collection, data analysis, and data visualization to extract insights from data. It incorporates skills from a broad range of disciplines that include computer science, statistics, mathematics, and visual design.

Application

Since the beginning of the 21st century, data science has been used in generally every field of industry to extract insights from data that may be leveraged in business decision-making and product development. These applications are largely related to fields of study rooted in data science, including:

Languages and Tools

  • Python
    • Matplotlib
    • Pandas
    • Scikit-learn
    • TensorFlow
    • NLTK
  • R (ggplot2)
  • Excel
  • Tableau
  • SQL
  • Jupyter Notebooks
  • MATLAB

History

Many statisticians have argued that data science is not a new field, but rather another name for statistics. Considering this perspective, the history of data science would date as far back as 5th century B.C., demonstrated by the Athenians who estimated the height of ladders needed to scale the walls of Platea by counting the bricks of the wall vertically in several areas, then multiplying the most frequent count by the height of a brick.

In 1662, John Graunt produced Natural and Political Observations Made Upon the Bills of Mortality in which he estimated the population of London by using annual funeral records, familial death rates, and average family size.

Without the correlation to statistics involved, many consider John Tukey to be the inventor of data science where in March 1962 he published The Future of Data Analysis where he described a field he called “data analysis,” which resembles modern data science. With advents in data processing and storage, applications of data science have accelerated in both complexity and popularity.

Concepts

Some of the fundamental concepts and tools of data science are explored below:

Data Science

Big Data
Big data involves working with and developing insights from large datasets.
Data Mining
Data mining is the process of applying algorithms to search for patterns within collections of data.
Data Warehouse
A data warehouse is a collection of stored data resources that are designed for use in analysis and business intelligence applications.
Jupyter Notebook
Jupyter Notebook (sometimes called IPython Notebook) is a popular way to write and run Python, R, or Julia code, especially for data analysis, data science and machine learning. Jupyter Notebooks are easy-to-use because they let you execute code and review the output quickly. This iterative process is central to data analytics and makes it easy to test hypotheses and record the results (just like a notebook).
One Hot Encoding
One hot encoding is a method of encoding categorical variables as binary vectors that can be more readily used by machine learning algorithms.

All contributors

Looking to contribute?

Learn More on Codecademy