Traffic Safety Case Study

Date and Time in Python

Python provides a module named datetime to deal with dates and times.

It allows you to set date ,time or both date and time using the date(),time()and datetime() functions respectively, after importing the datetime module .

import datetimefeb_16_2019 = datetime.date(year=2019, month=2, day=16)feb_16_2019 = datetime.date(2019, 2, 16)print(feb_16_2019) #2019-02-16
time_13_48min_5sec = datetime.time(hour=13, minute=48, second=5)time_13_48min_5sec = datetime.time(13, 48, 5)print(time_13_48min_5sec) #13:48:05
timestamp= datetime.datetime(year=2019, month=2, day=16, hour=13, minute=48, second=5)timestamp = datetime.datetime(2019, 2, 16, 13, 48, 5)print (timestamp) #2019-01-02 13:48:05


Distribution Plots with Seaborn

In seaborn, distributions can be visualized using .histplot(), .kdeplot(), and .boxplot(), among other visualization functions.

The main parameters are data and x.

• data is an optional parameter for the name of the pandas DataFrame.
• x is the column name for the variable of interest.

The y-axis shows the frequency for histograms, the probability density for KDE plots, and the values for box plots.

For box plots, setting the y parameter to a grouping variable will show a box plot for each group on the same plotting grid.

import seaborn as sns
# histogram of heightssns.histplot(data=df, x='height')
# KDE plot of heightssns.kdeplot(data=df, x='height')
# box plot of heightssns.boxplot(data=df, x='height')
# box plots of heights by age groupsns.boxplot(data=df, x='height', y='age_range')

Scatter Plots with Seaborn

In seaborn, a scatter plot can be created with .scatterplot(). The main parameters are data, x, and y.

• data is an optional parameter for the name of the pandas DataFrame.
• x is the column name for the x-axis of the plot.
• y is the column name for the y-axis of the plot.

A scatter plot with a regression line can be created with .regplot(). This function takes the same parameters as .scatterplot() and produces the same plot, but with a regression line drawn on the scatter plot. By default, a 95% confidence interval is included as a shaded region around the line.

import seaborn as sns
# scatter plot of bird count by temperaturesns.scatterplot(data=df, x='bird_count', y='temperature')
# same plot with regression linesns.regplot(data=df, x='bird_count', y='temperature')

Correlation

Correlation ranges from negative one to positive one and is used to measure the strength of a linear association between two quantitative variables. A correlation closer to negative one indicates a strong negative linear where large values of one variable are associated with small values of the other. A correlation closer to positive one indicates high positive linearity where large values of one variable are associated with large values of the other. A correlation of 0 indicates there is no linear relationship. The figure shows pairs of variables with correlations ranging from negative one to one.

Linear Regression with Sklearn

In Python, we can use scikit-learn to run a simple linear regression.

The sample code shows each step for running a simple linear regression with a pandas DataFrame called df:

1. Format the variables to be readable for sklearn.
2. Instantiate and run the regression model.
3. View the intercept and slope coefficient.
from sklearn.linear_model import LinearRegression
# format variablesX = df['input_var'].to_numpy().reshape(-1, 1)y = df['output_var'].to_numpy().reshape(-1, 1)
# run regressionlm = LinearRegression()model = lm.fit(X, y)
# view interceptlm.intercept_
# view slope coefficientlm.coef_

Prediction using a Simple Linear Model

In order to use a simple linear regression model to make a prediction, we need to plug in the slope and intercept to the equation for a line (y=mx+b). For example, suppose we fit a linear model to predict weight based on height and calculate an intercept of -200 and slope of 5. The equation is:

$weight = 5*height - 200$

Therefore, a person who is 60 inches tall would be expected to weigh 100 pounds:

$weight = 5*60-200 = 100$