Python provides a module named `datetime`

to deal with dates and times.

It allows you to set `date`

,`time`

or both `date`

and `time`

using the `date()`

,`time()`

and `datetime()`

functions respectively, after importing the `datetime`

module .

import datetimefeb_16_2019 = datetime.date(year=2019, month=2, day=16)feb_16_2019 = datetime.date(2019, 2, 16)print(feb_16_2019) #2019-02-16time_13_48min_5sec = datetime.time(hour=13, minute=48, second=5)time_13_48min_5sec = datetime.time(13, 48, 5)print(time_13_48min_5sec) #13:48:05timestamp= datetime.datetime(year=2019, month=2, day=16, hour=13, minute=48, second=5)timestamp = datetime.datetime(2019, 2, 16, 13, 48, 5)print (timestamp) #2019-01-02 13:48:05

In seaborn, distributions can be visualized using `.histplot()`

, `.kdeplot()`

, and `.boxplot()`

, among other visualization functions.

The main parameters are `data`

and `x`

.

`data`

is an optional parameter for the name of the pandas DataFrame.`x`

is the column name for the variable of interest.

The y-axis shows the frequency for histograms, the probability density for KDE plots, and the values for box plots.

For box plots, setting the `y`

parameter to a grouping variable will show a box plot for each group on the same plotting grid.

import seaborn as sns# histogram of heightssns.histplot(data=df, x='height')# KDE plot of heightssns.kdeplot(data=df, x='height')# box plot of heightssns.boxplot(data=df, x='height')# box plots of heights by age groupsns.boxplot(data=df, x='height', y='age_range')

In seaborn, a scatter plot can be created with `.scatterplot()`

. The main parameters are `data`

, `x`

, and `y`

.

`data`

is an optional parameter for the name of the pandas DataFrame.`x`

is the column name for the x-axis of the plot.`y`

is the column name for the y-axis of the plot.

A scatter plot with a regression line can be created with `.regplot()`

. This function takes the same parameters as `.scatterplot()`

and produces the same plot, but with a regression line drawn on the scatter plot. By default, a 95% confidence interval is included as a shaded region around the line.

import seaborn as sns# scatter plot of bird count by temperaturesns.scatterplot(data=df, x='bird_count', y='temperature')# same plot with regression linesns.regplot(data=df, x='bird_count', y='temperature')

Correlation ranges from negative one to positive one and is used to measure the strength of a linear association between two quantitative variables. A correlation closer to negative one indicates a strong negative linear where large values of one variable are associated with small values of the other. A correlation closer to positive one indicates high positive linearity where large values of one variable are associated with large values of the other. A correlation of 0 indicates there is no linear relationship. The figure shows pairs of variables with correlations ranging from negative one to one.

In Python, we can use scikit-learn to run a simple linear regression.

The sample code shows each step for running a simple linear regression with a pandas DataFrame called `df`

:

- Format the variables to be readable for sklearn.
- Instantiate and run the regression model.
- View the intercept and slope coefficient.

from sklearn.linear_model import LinearRegression# format variablesX = df['input_var'].to_numpy().reshape(-1, 1)y = df['output_var'].to_numpy().reshape(-1, 1)# run regressionlm = LinearRegression()model = lm.fit(X, y)# view interceptlm.intercept_# view slope coefficientlm.coef_

In order to use a simple linear regression model to make a prediction, we need to plug in the slope and intercept to the equation for a line (y=mx+b). For example, suppose we fit a linear model to predict weight based on height and calculate an intercept of -200 and slope of 5. The equation is:

weight = 5*height - 200

Therefore, a person who is 60 inches tall would be expected to weigh 100 pounds:

weight = 5*60-200 = 100