Codecademy Logo

Traffic Safety Case Study

Date and Time in Python

Python provides a module named datetime to deal with dates and times.

It allows you to set date ,time or both date and time using the date(),time()and datetime() functions respectively, after importing the datetime module .

import datetime
feb_16_2019 = datetime.date(year=2019, month=2, day=16)
feb_16_2019 = datetime.date(2019, 2, 16)
print(feb_16_2019) #2019-02-16
time_13_48min_5sec = datetime.time(hour=13, minute=48, second=5)
time_13_48min_5sec = datetime.time(13, 48, 5)
print(time_13_48min_5sec) #13:48:05
timestamp= datetime.datetime(year=2019, month=2, day=16, hour=13, minute=48, second=5)
timestamp = datetime.datetime(2019, 2, 16, 13, 48, 5)
print (timestamp) #2019-01-02 13:48:05

Distribution Plots with Seaborn

In seaborn, distributions can be visualized using .histplot(), .kdeplot(), and .boxplot(), among other visualization functions.

The main parameters are data and x.

  • data is an optional parameter for the name of the pandas DataFrame.
  • x is the column name for the variable of interest.

The y-axis shows the frequency for histograms, the probability density for KDE plots, and the values for box plots.

For box plots, setting the y parameter to a grouping variable will show a box plot for each group on the same plotting grid.

import seaborn as sns
# histogram of heights
sns.histplot(data=df, x='height')
# KDE plot of heights
sns.kdeplot(data=df, x='height')
# box plot of heights
sns.boxplot(data=df, x='height')
# box plots of heights by age group
sns.boxplot(data=df, x='height', y='age_range')

Scatter Plots with Seaborn

In seaborn, a scatter plot can be created with .scatterplot(). The main parameters are data, x, and y.

  • data is an optional parameter for the name of the pandas DataFrame.
  • x is the column name for the x-axis of the plot.
  • y is the column name for the y-axis of the plot.

A scatter plot with a regression line can be created with .regplot(). This function takes the same parameters as .scatterplot() and produces the same plot, but with a regression line drawn on the scatter plot. By default, a 95% confidence interval is included as a shaded region around the line.

import seaborn as sns
# scatter plot of bird count by temperature
sns.scatterplot(data=df, x='bird_count', y='temperature')
# same plot with regression line
sns.regplot(data=df, x='bird_count', y='temperature')

Correlation

Correlation ranges from negative one to positive one and is used to measure the strength of a linear association between two quantitative variables. A correlation closer to negative one indicates a strong negative linear where large values of one variable are associated with small values of the other. A correlation closer to positive one indicates high positive linearity where large values of one variable are associated with large values of the other. A correlation of 0 indicates there is no linear relationship. The figure shows pairs of variables with correlations ranging from negative one to one.

This figure shows 5 different plots. From left to right, the plots show a correlation of 1, a large positive correlation, no correlation, a large negative correlation, and a correlation of -1.

Linear Regression with Sklearn

In Python, we can use scikit-learn to run a simple linear regression.

The sample code shows each step for running a simple linear regression with a pandas DataFrame called df:

  1. Format the variables to be readable for sklearn.
  2. Instantiate and run the regression model.
  3. View the intercept and slope coefficient.
from sklearn.linear_model import LinearRegression
# format variables
X = df['input_var'].to_numpy().reshape(-1, 1)
y = df['output_var'].to_numpy().reshape(-1, 1)
# run regression
lm = LinearRegression()
model = lm.fit(X, y)
# view intercept
lm.intercept_
# view slope coefficient
lm.coef_

Prediction using a Simple Linear Model

In order to use a simple linear regression model to make a prediction, we need to plug in the slope and intercept to the equation for a line (y=mx+b). For example, suppose we fit a linear model to predict weight based on height and calculate an intercept of -200 and slope of 5. The equation is:

weight=5height200weight = 5*height - 200

Therefore, a person who is 60 inches tall would be expected to weigh 100 pounds:

weight=560200=100weight = 5*60-200 = 100

Learn More on Codecademy