.assign()
The .assign()
method in pandas is used to create new columns or modify existing columns in a DataFrame
while preserving the original DataFrame
. It returns a new DataFrame
object with all original columns plus the newly assigned ones, making it an ideal choice for functional programming approaches and method chaining.
The .assign()
method is particularly useful in data preprocessing, feature engineering, and exploratory data analysis workflows. It allows for dynamic column creation using both static values and callable functions, making it versatile for various data manipulation scenarios. Common use cases include calculating derived metrics, creating categorical variables, applying transformations to existing columns, and building complex data pipelines where immutability is preferred.
Syntax
DataFrame.assign(**kwargs)
Parameters:
**kwargs
: Keyword arguments where the keys represent column names and values can be:- Scalar values: Single values assigned to all rows
- Array-like objects: Lists, arrays, or Series with length matching the
DataFrame
- Callable functions: Functions or lambda expressions that take the
DataFrame
as input and return values for the new column
Return value:
A new DataFrame
containing all original columns plus the newly assigned columns. If a column name already exists, it will be replaced in the returned DataFrame
.
Example 1: Adding Basic Columns
This example demonstrates the fundamental usage of .assign()
to add new columns with static values:
import pandas as pd# Create sample DataFramedf = pd.DataFrame({'product_id': [101, 102, 103, 104],'price': [25.99, 45.50, 12.75, 67.25],'quantity': [10, 5, 20, 8]})# Add new columns using .assign()df_updated = df.assign(currency='USD',in_stock=True,category='Electronics')print(df_updated)
Output of this code is:
product_id price quantity currency in_stock category0 101 25.99 10 USD True Electronics1 102 45.50 5 USD True Electronics2 103 12.75 20 USD True Electronics3 104 67.25 8 USD True Electronics
This example shows how .assign()
creates a new DataFrame
with three additional columns. The currency
and category
columns receive string values, while in_stock
receives a boolean value that is applied to all rows.
Example 2: Calculating Revenue Analysis
This example demonstrates using .assign()
with lambda functions to perform business calculations and create derived metrics:
import pandas as pd# Create sales DataFramesales_df = pd.DataFrame({'product_name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],'unit_price': [899.99, 29.99, 79.99, 299.99],'units_sold': [50, 200, 150, 75],'cost_per_unit': [600.00, 15.00, 45.00, 180.00]})# Calculate business metrics using .assign()sales_analysis = sales_df.assign(total_revenue=lambda x: x['unit_price'] * x['units_sold'],total_cost=lambda x: x['cost_per_unit'] * x['units_sold'],profit=lambda x: (x['unit_price'] - x['cost_per_unit']) * x['units_sold'],profit_margin=lambda x: ((x['unit_price'] - x['cost_per_unit']) / x['unit_price']) * 100)print(sales_analysis.round(2))
The output of this code is:
product_name unit_price units_sold cost_per_unit total_revenue total_cost profit profit_margin0 Laptop 899.99 50 600.00 44999.50 30000.0 14999.5 33.331 Mouse 29.99 200 15.00 5998.00 3000.0 2998.0 50.022 Keyboard 79.99 150 45.00 11998.50 6750.0 5248.5 43.763 Monitor 299.99 75 180.00 22499.25 13500.0 8999.25 40.00
This example demonstrates real-world financial analysis where .assign()
calculates multiple business metrics. Each lambda function receives the DataFrame
as input and performs calculations using existing columns to create comprehensive sales analytics.
Codebyte Example: Customer Segmentation Pipeline
This example shows advanced usage of .assign()
for customer segmentation and data categorization in a chained workflow:
This advanced example demonstrates how .assign()
can be used for complex customer segmentation. It creates multiple categorical variables using pd.cut()
, conditional logic with np.where()
, and mathematical calculations, all within a single method call to build a comprehensive customer analysis framework.
Frequently Asked Questions
1. Does .assign()
modify the original DataFrame
?
No, .assign()
returns a new DataFrame
with the assigned columns while leaving the original DataFrame
unchanged.
2. Can I assign multiple columns in a single .assign()
call?
Yes, you can assign multiple columns by providing multiple keyword arguments. Later columns in the same .assign()
call can reference earlier columns that were created in the same call.
3. How does .assign()
handle existing column names?
If you assign to an existing column name, .assign()
will overwrite that column in the returned DataFrame
. The original DataFrame
remains unchanged, but the new DataFrame
will have the updated column values.
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python:Pandas on Codecademy
- Career path
Data Scientist: Machine Learning Specialist
Machine Learning Data Scientists solve problems at scale, make predictions, find patterns, and more! They use Python, SQL, and algorithms.Includes 27 CoursesWith Professional CertificationBeginner Friendly95 hours - Course
Learn Python 3
Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.With CertificateBeginner Friendly23 hours