.mean()
The .mean()
method calculates and returns the arithmetic mean of elements in a NumPy array. It computes the average by summing all elements along the specified axis and dividing by the number of elements. This method is one of the fundamental statistical functions in NumPy that data scientists and analysts use to understand the central tendency of numerical data.
NumPy’s .mean()
is highly versatile, allowing calculation of means across entire arrays or along specific dimensions. It’s commonly used in data analysis, scientific computing, and machine learning for tasks such as feature normalization, statistical analysis, and data preprocessing.
Syntax
numpy.mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, where=<no value>)
Parameters:
a
: The array containing numbers whose mean is to be calculated.axis
(Optional): Axis or axes along which the means are computed. IfNone
, the array is flattened before computation.dtype
(Optional): The data type used for calculating the mean. By default,float64
is used for integers, and the input data type is preserved for floating-point numbers.out
(Optional): Alternative output array to store the result. Must have the same shape as expected output.keepdims
(Optional): IfTrue
, retains the reduced dimensions as size one, ensuring consistency for broadcasting.where
(Optional): Elements to include in the mean calculation. Must be a boolean array with the same shape asa
.
Return value:
The .mean()
method returns an ndarray containing the mean values. If axis
is None
, the result is a scalar value.
Example 1: Basic Mean Calculation
This example demonstrates how to calculate the mean of a one-dimensional NumPy array:
import numpy as np# Create a 1D arrayarray1 = np.array([0, 1, 2, 3, 4, 5, 6, 7])# Calculate the mean of the arrayavg = np.mean(array1)print("Array:", array1)print("Mean value:", avg)
This example results in the following output:
Array: [0 1 2 3 4 5 6 7]Mean value: 3.5
In this example, a 1D array with values from 0 to 7 is created, and the arithmetic mean is calculated, which is 3.5 (the sum of all elements divided by the number of elements).
Example 2: Calculating Mean Across Different Axes
This example shows how to compute the mean along different axes of a multi-dimensional array, which is useful in many data analysis scenarios:
import numpy as np# Create a 3D arrayarray1 = np.array([[[1, 2], [3, 4]],[[5, 6], [7, 8]]])# Print the array shape and the array itselfprint("Array shape:", array1.shape)print("Array:\n", array1)# Find the mean of entire arraymean1 = np.mean(array1)# Find the mean across axis 0mean2 = np.mean(array1, axis=0)# Find the mean across axis 0 and 1mean3 = np.mean(array1, (0, 1))print("\nMean of the entire array:", mean1)print("Mean across axis 0:\n", mean2)print("Mean across axis 0 and 1:", mean3)
This example results in the following output:
Array shape: (2, 2, 2)Array:[[[1 2][3 4]][[5 6][7 8]]]Mean of the entire array: 4.5Mean across axis 0:[[3. 4.][5. 6.]]Mean across axis 0 and 1: [4. 5.]
When calculating the mean without specifying an axis, all elements are averaged. When specifying axis=0
, the mean is calculated along the first dimension, resulting in a 2D array. When specifying both axes 0 and 1 with (0, 1)
, the result is a 1D array with the mean of all elements in each 2D slice.
Example 3: Data Analysis with Real-world Data
This example demonstrates how to use .mean()
to analyze temperature data, a common application in environmental science and meteorology:
import numpy as np# Monthly average temperatures (°C) for a city over 2 years# Rows: Years (2023, 2024)# Columns: Months (Jan to Dec)temperatures = np.array([[5.2, 6.8, 9.3, 13.5, 18.2, 22.6, 25.1, 24.3, 19.7, 14.2, 9.1, 6.3], # 2023[4.8, 6.5, 8.9, 14.1, 17.9, 23.2, 26.0, 25.2, 19.5, 13.8, 8.5, 5.9] # 2024])print("Temperature data shape:", temperatures.shape)# Calculate the average temperature for each yearyearly_avg = np.mean(temperatures, axis=1)print("\nYearly average temperatures:")for year, avg in zip([2023, 2024], yearly_avg):print(f"{year}: {avg:.2f}°C")# Calculate the average temperature for each month across yearsmonthly_avg = np.mean(temperatures, axis=0)months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun","Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]print("\nMonthly average temperatures across years:")for month, avg in zip(months, monthly_avg):print(f"{month}: {avg:.2f}°C")# Calculate the overall average temperatureoverall_avg = np.mean(temperatures)print("\nOverall average temperature: {:.2f}°C".format(overall_avg))
This example results in the following output:
Temperature data shape: (2, 12)Yearly average temperatures:2023: 14.52°C2024: 14.53°CMonthly average temperatures across years:Jan: 5.00°CFeb: 6.65°CMar: 9.10°CApr: 13.80°CMay: 18.05°CJun: 22.90°CJul: 25.55°CAug: 24.75°CSep: 19.60°COct: 14.00°CNov: 8.80°CDec: 6.10°COverall average temperature: 14.53°C
This example shows how .mean()
can be used to analyze temperature data by calculating yearly averages, monthly averages across years, and the overall average temperature.
Codebyte Example: Student Exam Score Analysis
This example demonstrates how to use .mean()
to analyze student exam scores, a common task in educational assessment:
FAQs
1. What's the difference between `np.mean()` and `np.average()`?
While both calculate the arithmetic mean, `np.average()` allows specifying weights for elements, enabling weighted averages, whereas `np.mean()` treats all values equally.
2. How does NumPy's `.mean()` handle `NaN` values?
By default, `.mean()` will return `NaN` if any of the values being averaged are `NaN`. To ignore `NaN` values, use `np.nanmean()` instead.
3. Can `.mean()` calculate the mean of strings or other non-numeric data?
No, `.mean()` works only with numeric data. Attempting to calculate the mean of non-numeric data will result in a `TypeError`.
4. How can dimensions be preserved when calculating means along an axis?
Set the `keepdims=True` parameter to maintain the dimensions of the original array in the output.
5. Is there a performance difference between using `.mean()` method and the `np.mean()` function?
No significant performance difference exists between `arr.mean()` and `np.mean(arr)` as they both call the same underlying implementation. Choose the syntax that makes code more readable.
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python:NumPy on Codecademy
- Career path
Data Scientist: Machine Learning Specialist
Machine Learning Data Scientists solve problems at scale, make predictions, find patterns, and more! They use Python, SQL, and algorithms.Includes 27 CoursesWith Professional CertificationBeginner Friendly95 hours - Course
Learn Python 3
Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.With CertificateBeginner Friendly23 hours