The standard deviation is a measure of a dataset’s spread. It is calculated by taking the square root of the variance of a data set. The resulting value has the same units as the original data.
In Python, we can calculate the variance of an array using the NumPy var()
function.
import numpy as npvalues = np.array([1,3,4,2,6,3,4,5])# calculate variance of valuesvariance = np.var(values)
Because standard deviation is in the same units as the original data set, it is often used to provide context for the mean of the dataset. For example, if the data set is [3, 5, 10, 14]
, the standard deviation is 4.301
units, and the mean is 8.0
units. By using the standard deviation, we can fairly easily see that the data point 14
is more than one standard deviation away from the mean.
We can calculate standard deviation in Python using the NumPy std()
function.
import numpy as npvalues = np.array([1,3,4,2,6,3,4,5])# calculate standard deviation of valuesvariance = np.std(values)
A larger variance means the data is more spread out and values tend to be far away from the mean. A variance of 0 means all values in the dataset are the same.
Variance is a measure of spread. It is calculated by finding the average of the squared differences between every observation and the mean. The resulting value is in units squared.