# Data Science

**Data Science** is a multidisciplinary field of study that utilizes data collection, data analysis, and data visualization to extract insights from data. It incorporates skills from a broad range of disciplines that include computer science, statistics, mathematics, and visual design.

## Application

Since the beginning of the 21st century, data science has been used in generally every field of industry to extract insights from data that may be leveraged in business decision-making and product development. These applications are largely related to fields of study rooted in data science, including:

- Data Visualization
- Data Engineering
- Machine Learning & Deep Learning
- Artificial Intelligence
- Cloud and Distributed Computing
- Business Intelligence and Strategy

## Languages and Tools

- Python
- Matplotlib
- Pandas
- Scikit-learn
- TensorFlow
- NLTK

- R (ggplot2)
- Excel
- Tableau
- SQL
- Jupyter Notebooks
- MATLAB

## History

Many statisticians have argued that data science is not a new field, but rather another name for statistics. Considering this perspective, the history of data science would date as far back as 5th century B.C., demonstrated by the Athenians who estimated the height of ladders needed to scale the walls of Platea by counting the bricks of the wall vertically in several areas, then multiplying the most frequent count by the height of a brick.

In 1662, John Graunt produced *Natural and Political Observations Made Upon the Bills of Mortality* in which he estimated the population of London by using annual funeral records, familial death rates, and average family size.

Without the correlation to statistics involved, many consider John Tukey to be the inventor of data science where in March 1962 he published *The Future of Data Analysis* where he described a field he called “data analysis,” which resembles modern data science. With advents in data processing and storage, applications of data science have accelerated in both complexity and popularity.

## Concepts

Some of the fundamental concepts and tools of data science are explored below:

## Data Science

- Big Data
- Big data involves working with and developing insights from large datasets.
- Data Mining
- Data mining is the process of applying algorithms to search for patterns within collections of data.
- Data Warehouse
- A data warehouse is a collection of stored data resources that are designed for use in analysis and business intelligence applications.
- Jupyter Notebook
- Jupyter Notebook (sometimes called IPython Notebook) is a popular way to write and run Python, R, or Julia code, especially for data analysis, data science and machine learning. Jupyter Notebooks are easy-to-use because they let you execute code and review the output quickly. This iterative process is central to data analytics and makes it easy to test hypotheses and record the results (just like a notebook).
- One Hot Encoding
- One hot encoding is a method of encoding categorical variables as binary vectors that can be more readily used by machine learning algorithms.

## All contributors

- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.