Data Science is a multidisciplinary field of study that utilizes data collection, data analysis, and data visualization to extract insights from data. It incorporates skills from a broad range of disciplines that include computer science, statistics, mathematics, and visual design.
Since the beginning of the 21st century, data science has been used in generally every field of industry to extract insights from data that may be leveraged in business decision-making and product development. These applications are largely related to fields of study rooted in data science, including:
- Data Visualization
- Data Engineering
- Machine Learning & Deep Learning
- Artificial Intelligence
- Cloud and Distributed Computing
- Business Intelligence and Strategy
Languages and Tools
- R (ggplot2)
- Jupyter Notebooks
Many statisticians have argued that data science is not a new field, but rather another name for statistics. Considering this perspective, the history of data science would date as far back as 5th century B.C., demonstrated by the Athenians who estimated the height of ladders needed to scale the walls of Platea by counting the bricks of the wall vertically in several areas, then multiplying the most frequent count by the height of a brick.
In 1662, John Graunt produced Natural and Political Observations Made Upon the Bills of Mortality in which he estimated the population of London by using annual funeral records, familial death rates, and average family size.
Without the correlation to statistics involved, many consider John Tukey to be the inventor of data science where in March 1962 he published The Future of Data Analysis where he described a field he called “data analysis,” which resembles modern data science. With advents in data processing and storage, applications of data science have accelerated in both complexity and popularity.
Some of the fundamental concepts and tools of data science are explored below:
- Big Data
- Big data involves working with and developing insights from large datasets.
- Data Mining
- Data mining is the process of applying algorithms to search for patterns within collections of data.
- Data Warehouse
- A data warehouse is a collection of stored data resources that are designed for use in analysis and business intelligence applications.
- Jupyter Notebook
- Jupyter Notebook (sometimes called IPython Notebook) is a popular way to write and run Python, R, or Julia code, especially for data analysis, data science and machine learning. Jupyter Notebooks are easy-to-use because they let you execute code and review the output quickly. This iterative process is central to data analytics and makes it easy to test hypotheses and record the results (just like a notebook).
- One Hot Encoding
- One hot encoding is a method of encoding categorical variables as binary vectors that can be more readily used by machine learning algorithms.
- Anonymous contributorAnonymous contributor194 total contributions
- course170094706015 total contributions
- CaupolicanDiaz135 total contributions
- StevenSwiniarski475 total contributions
- Anonymous contributorAnonymous contributor3077 total contributions