.fillna()
.fillna()
is a method in Pandas that replaces null or missing values in a DataFrame
or Series with specified values. In data analysis, missing values (represented as NaN
in pandas) are common and can cause errors or skew analysis results if not handled properly. The .fillna()
method provides a flexible way to handle these null values by replacing them with meaningful data.
The .fillna()
method is widely used in data preprocessing and cleaning stages of the data analysis pipeline. It can replace missing values with a fixed value, forward/backwards fill from existing data, or even use different values for different columns. This functionality is essential when working with real-world datasets that often contain incomplete information due to various reasons such as data collection errors, data corruption, or simply because the information wasn’t available.
Syntax
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
Parameters:
value
: The value to use for filling null values. This can be a scalar (like0
,'Unknown'
), dictionary, Series, orDataFrame
.method
: Specifies the method to use for filling. Options include'ffill'
/'pad'
(forward fill),'bfill'
/'backfill'
(backward fill). The default isNone
.axis
: The axis along which to fill missing values (0
or'index'
for rows,1
or'columns'
for columns).inplace
: IfTrue
, modifies the DataFrame in place (returnsNone
). IfFalse
, returns a copy with replacements.limit
: Maximum number of consecutive NaN values to forward/backwards fill (if method is specified).downcast
: Dictionary or'infer'
to downcast dtypes if possible.
Return value:
The method returns a new DataFrame
or Series with filled values unless inplace=True
, in which case it returns None and modifies the original object.
Example 1: Replacing NaN
with a Static Value
This example demonstrates how to replace all missing values in a DataFrame
with a specified value:
# Importing pandas libraryimport pandas as pdimport numpy as np# Creating a sample DataFrame with NaN valuesdf = pd.DataFrame({'A': [1, 2, np.nan, 4],'B': [5, np.nan, np.nan, 8],'C': [9, 10, 11, np.nan]})# Display the original DataFrameprint("Original DataFrame:")print(df)# Replacing all NaN values with 0filled_df = df.fillna(0)# Display the resultprint("\nDataFrame after filling NaN with 0:")print(filled_df)
The output produced by the above code will be:
Original DataFrame:A B C0 1.0 5.0 9.01 2.0 NaN 10.02 NaN NaN 11.03 4.0 8.0 NaNDataFrame after filling NaN with 0:A B C0 1.0 5.0 9.01 2.0 0.0 10.02 0.0 0.0 11.03 4.0 8.0 0.0
In this example, we created a DataFrame
with some NaN values and used .fillna(0)
to replace all missing values with zero. This is the simplest way to use .fillna()
, providing a single value that replaces all null values across the entire DataFrame
.
Example 2: Column-Specific Value Replacement
This example shows how to fill missing values with different values for each column using a dictionary.
# Importing pandas libraryimport pandas as pdimport numpy as np# Creating a sample DataFrame with NaN valuessales_data = pd.DataFrame({'Product': ['A', 'B', 'C', 'D', 'E'],'Price': [10.5, 8.0, np.nan, 15.5, np.nan],'Units_Sold': [100, 150, np.nan, 80, 200],'In_Stock': [True, False, np.nan, True, np.nan]})# Display the original DataFrameprint("Original Sales Data:")print(sales_data)# Creating a dictionary with column-specific fill valuesfill_values = {'Price': 0.0,'Units_Sold': 0,'In_Stock': False}# Filling NaN values with column-specific valuesfilled_sales = sales_data.fillna(fill_values)# Display the resultprint("\nSales Data after filling NaN values:")print(filled_sales)
Original Sales Data:Product Price Units_Sold In_Stock0 A 10.5 100.0 True1 B 8.0 150.0 False2 C NaN NaN NaN3 D 15.5 80.0 True4 E NaN 200.0 NaNSales Data after filling NaN values:Product Price Units_Sold In_Stock0 A 10.5 100.0 True1 B 8.0 150.0 False2 C 0.0 0.0 False3 D 15.5 80.0 True4 E 0.0 200.0 False
In this real-life scenario, we have a sales dataset with missing values in the Price
, Units_Sold
, and In_Stock
columns. Using a dictionary with .fillna()
, we specify different fill values for each column: 0.0 for missing prices, 0 for missing units sold, and False for missing stock information. This approach allows for more contextually appropriate data filling.
Codebyte Example: Using Forward Fill Method for Time Series Data
This example demonstrates how to use method-based filling, which is particularly useful for time series data:
In this example, we work with a time series of temperature measurements where some days have missing data. We use .fillna(method='ffill')
to propagate the last valid observation forward to fill gaps. This method is particularly useful for time series data, where carrying forward the last known value often makes the most sense.
We also demonstrate the limit
parameter, which restricts propagation to only fill a specified number of consecutive NaN values. With limit=1
, the second consecutive missing value remains NaN, as seen on 2023-01-03.
Frequently Asked Questions
1. What’s the difference between fillna()
and replace()
?
The .fillna()
method specifically targets null (NaN) values, while .replace()
can substitute any specified value with another. Use .fillna()
when you only need to address missing data and .replace()
when you want to replace specific values.
2. Does fillna()
modify the original DataFrame?
By default, .fillna()
returns a new DataFrame with replacements. To modify the original DataFrame, set inplace=True
, but note that this returns None.
3. Can I use different methods for different columns?
No, the method
parameter applies to all columns. For different treatments per column, use separate .fillna()
calls or use the value
parameter with a dictionary.
4. What’s the best way to fill missing values in a dataset?
The appropriate approach depends on your data and analysis goals. Common strategies include:
- Using meaningful defaults (0, average, median)
- Forward/backwards filling for time series
- Interpolation for numerical data with trends
- Using domain knowledge to inform replacements
5. How can I fill NaNs
with the column mean?
You can use .fillna()
with a dictionary of column means:
df.fillna(df.mean())
For selective columns:
df['column_name'].fillna(df['column_name'].mean())
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python:Pandas on Codecademy
- Career path
Data Scientist: Machine Learning Specialist
Machine Learning Data Scientists solve problems at scale, make predictions, find patterns, and more! They use Python, SQL, and algorithms.Includes 27 CoursesWith Professional CertificationBeginner Friendly95 hours - Course
Learn Python 3
Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.With CertificateBeginner Friendly23 hours