Statistical Distributions

Anonymous contributor's avatar
Anonymous contributor
Published Feb 5, 2025
Contribute to Docs

The statsmodels library provides tools for working with empirical distributions, making it especially useful for non-parametric data analysis. Its Empirical Cumulative Distribution Function (ECDF) implementation estimates the cumulative distribution directly from data, without assuming a specific theoretical distribution. This is valuable for exploratory data analysis and assessing goodness of fit

Syntax

from statsmodels.distributions.empirical_distribution import ECDF
ecdf = ECDF(data, side='right')
  • data: The array-like input data points for which the empirical distribution will be calculated.
  • side(optional): Defines the side of the interval to include. The options are:
    • ‘right’: Creates a right-continuous step function (default behavior).
    • ‘left’: Creates a left-continuous step function.

Example

The following example demonstrates how to create an Empirical Cumulative Distribution Function (ECDF) from exponentially distributed data, visualize it using matplotlib, and calculate cumulative probabilities for specific values, showing how ECDF can be used for non-parametric estimation of the probability distribution:

import numpy as np
from statsmodels.distributions.empirical_distribution import ECDF
import matplotlib.pyplot as plt
# Generate sample data
np.random.seed(42)
data = np.random.exponential(size=200)
# Create ECDF object
ecdf = ECDF(data)
# Generate points for plotting
x = np.linspace(min(data), max(data), 100)
y = ecdf(x)
# Plot ECDF
plt.plot(x, y, 'b-', label='ECDF')
plt.xlabel('Value')
plt.ylabel('Cumulative Probability')
plt.title('Empirical Cumulative Distribution Function')
plt.grid(True)
plt.legend()
plt.show()
# Calculate probabilities for specific points
points = [0.5, 1.0, 1.5]
probabilities = ecdf(points)
print("\nCumulative probabilities:")
for point, prob in zip(points, probabilities):
print(f"P(X ≤ {point:.1f}) = {prob:.3f}")

The code above generates the ouput as follows:

Cumulative probabilities:
P(X ≤ 0.5) = 0.450
P(X ≤ 1.0) = 0.635
P(X ≤ 1.5) = 0.780

The plot generated by the above code will be:

Statistical Distributions

All contributors

Contribute to Docs

Learn Python on Codecademy