Python:NumPy .percentile()
The np.percentile() is a built-in method of Numpy, a statistical measure that indicates the value below which a given percentage of observations in a dataset falls. NumPy’s .percentile() function computes the q-th percentile of data along a specified axis, making it an essential tool for statistical analysis and data exploration.
Syntax of NumPy .percentile()
numpy.percentile(a, q, axis=None, out=None, ...)
Parameters:
a: Input array or object that can be converted to an arrayq: Percentile or sequence of percentiles to compute. Values must be between 0 and 100 inclusiveaxis(optional): Axis or axes along which the percentiles are computed. Default isNone(flattened array)out(optional): Alternative output array in which to place the result
Note: The ellipses in the syntax (
...) indicate that there are more optional parameters available, such asoverwrite_input,method,keepdims, and `weights``. These allow for more control over how the percentile is calculated and returned.
Return value:
Returns the q-th percentile(s) of the array elements. If q is a single percentile, returns a scalar. If multiple percentiles are given, returns an array.
Example 1: Basic Percentile Calculation Using np.percentile() Method
This example demonstrates how to calculate a single percentile from a one-dimensional array:
import numpy as np# Create a sample arraydata = np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19])# Calculate the 50th percentile (median)result = np.percentile(data, 50)print(f"50th percentile: {result}")# Calculate multiple percentilesquartiles = np.percentile(data, [25, 50, 75])print(f"Quartiles: {quartiles}")
The output of this code is:
50th percentile: 10.0Quartiles: [ 5.5 10. 14.5]
The 50th percentile represents the median value, which is the middle value when the data is sorted. The function returns 10.0 as the median, and [5.5, 10.0, 14.5] for the quartiles.
Example 2: Analyzing Test Score Distribution Using NumPy’s .percentile()
This example shows how to use .percentile() to analyze student test scores and understand performance distribution:
import numpy as np# Student test scores from multiple classestest_scores = np.array([[85, 92, 78, 95, 88], # Class 1[72, 85, 90, 67, 82], # Class 2[95, 89, 91, 87, 93], # Class 3[68, 75, 80, 72, 77] # Class 4])# Calculate percentiles for overall performanceoverall_scores = test_scores.flatten()performance_metrics = np.percentile(overall_scores, [10, 25, 50, 75, 90])print("Performance Distribution:")print(f"10th percentile (bottom 10%): {performance_metrics[0]}")print(f"25th percentile (Q1): {performance_metrics[1]}")print(f"Median (Q2): {performance_metrics[2]}")print(f"75th percentile (Q3): {performance_metrics[3]}")print(f"90th percentile (top 10%): {performance_metrics[4]}")# Calculate median score for each classclass_medians = np.percentile(test_scores, 50, axis=1)print(f"\nMedian scores by class: {class_medians}")
The output of this code is:
Performance Distribution:10th percentile (bottom 10%): 71.625th percentile (Q1): 76.5Median (Q2): 85.075th percentile (Q3): 90.2590th percentile (top 10%): 93.2Median scores by class: [88. 82. 91. 75.]
This analysis helps identify performance ranges and compare different classes. The overall percentiles show the distribution of all scores, while the class-specific medians reveal which classes perform better on average.
Codebyte Example: Using np.percentile() for Financial Data Analysis
This example demonstrates using .percentile() for financial risk analysis with stock price data:
This financial analysis uses percentiles to calculate Value at Risk (VaR), which helps investors understand potential losses, and to identify extreme performance periods in the stock’s history.
Frequently Asked Questions
1. What’s the difference between .percentile() and .quantile()?
.percentile() uses values from 0-100, while .quantile() uses values from 0-1. For example, percentile(data, 25) equals quantile(data, 0.25).
2. How do I handle NaN values in my data?
Use np.nanpercentile() instead of np.percentile() to automatically ignore NaN values in your calculations.
3. Can I calculate percentiles along specific axes in multi-dimensional arrays?
Yes, use the axis parameter. For example, axis=0 calculates percentiles along rows, axis=1 along columns, and axis=None (default) flattens the array first.
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python:NumPy on Codecademy
- Machine Learning Data Scientists solve problems at scale, make predictions, find patterns, and more! They use Python, SQL, and algorithms.
- Includes 27 Courses
- With Professional Certification
- Beginner Friendly.95 hours
- Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.
- With Certificate
- Beginner Friendly.24 hours