Python:Pandas .loc
The .loc property in Pandas is used to access and manipulate rows and columns using row and column labels instead of integer-based positions. It offers a clear and intuitive way to retrieve, update, or filter data based on row and column names, enhancing code readability and reducing the chances of errors when the data structure changes.
DataFrame‘s .loc property is commonly used in data analysis and manipulation tasks, such as selecting specific subsets of data, filtering based on conditions, and updating values. It’s an essential tool for data scientists, analysts, and anyone working with tabular data in Python.
Syntax
DataFrame.loc[row_indexer, column_indexer]
Parameters:
row_indexer: Specifies which rows to select. Can be a single label, a list of labels, a slice with labels, a boolean array, or a callable function.column_indexer: Specifies which columns to select. Can be a single label, a list of labels, a slice with labels, a boolean array, or a callable function.
Both parameters are optional. If only one is provided, it is assumed to be the row indexer.
Return value:
The .loc property returns:
- A scalar value when both row and column are specified as single labels
- A pandas Series when either row or column indexer is specified as a single label
- A pandas
DataFramewhen selecting multiple rows and/or columns
Example 1: Basic Label-Based Selection
This example demonstrates how to use .loc to select data from a DataFrame using row and column labels:
# Import pandas libraryimport pandas as pd# Create a sample DataFrame with student recordsdata = {'Name': ['John', 'Emma', 'Michael', 'Sophia', 'David'],'Math': [85, 92, 78, 95, 88],'Science': [92, 88, 75, 91, 84],'English': [80, 95, 82, 89, 90]}# Create the DataFrame with custom row indicesdf = pd.DataFrame(data)df.index = ['S001', 'S002', 'S003', 'S004', 'S005'] # Set custom student IDs as index# Print the original DataFrameprint("Original DataFrame:")print(df)# Select a specific student's record using .locstudent_record = df.loc['S003']print("\nRecord for student with ID 'S003':")print(student_record)# Select a specific value (Michael's Science score) using .locmichael_science = df.loc['S003', 'Science']print("\nMichael's Science score:", michael_science)# Select multiple students' Math and Science scoresselected_scores = df.loc[['S001', 'S004'], ['Math', 'Science']]print("\nMath and Science scores for students S001 and S004:")print(selected_scores)
The output of this code will be:
Original DataFrame:Name Math Science EnglishS001 John 85 92 80S002 Emma 92 88 95S003 Michael 78 75 82S004 Sophia 95 91 89S005 David 88 84 90Record for student with ID 'S003':Name MichaelMath 78Science 75English 82Name: S003, dtype: objectMichael's Science score: 75Math and Science scores for students S001 and S004:Math ScienceS001 85 92S004 95 91
This example shows how to use .loc to select data at different levels of granularity: an entire row, a specific cell, and a subset of rows and columns, all using label-based indexing.
Example 2: Filtering Data with Conditions
This example demonstrates how to use .loc with boolean conditions to filter data, a common operation in data analysis:
# Import pandas libraryimport pandas as pd# Create a sample DataFrame with employee recordsdata = {'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Evan', 'Fiona'],'Department': ['Sales', 'IT', 'Marketing', 'IT', 'Finance', 'Sales'],'Salary': [72000, 85000, 65000, 90000, 95000, 62000],'Experience': [5, 8, 3, 10, 12, 2]}df = pd.DataFrame(data)# Print the original DataFrameprint("Original Employee DataFrame:")print(df)# Filter employees with salary greater than 80000high_salary = df.loc[df['Salary'] > 80000]print("\nEmployees with salary greater than 80000:")print(high_salary)# Filter IT department employees with more than 5 years of experienceexperienced_it = df.loc[(df['Department'] == 'IT') & (df['Experience'] > 5)]print("\nIT employees with more than 5 years of experience:")print(experienced_it)# Multiple conditions: Sales employees with salary less than 70000 or experience less than 4sales_filter = df.loc[(df['Department'] == 'Sales') & ((df['Salary'] < 70000) | (df['Experience'] < 4))]print("\nSales employees with salary less than 70000 or experience less than 4:")print(sales_filter)# Update salaries: Give a 10% raise to employees with experience > 10 yearsdf.loc[df['Experience'] > 10, 'Salary'] *= 1.1print("\nDataFrame after giving 10% raise to highly experienced employees:")print(df)
The output produced by this code is:
Original Employee DataFrame:Name Department Salary Experience0 Alice Sales 72000 51 Bob IT 85000 82 Charlie Marketing 65000 33 Diana IT 90000 104 Evan Finance 95000 125 Fiona Sales 62000 2Employees with salary greater than 80000:Name Department Salary Experience1 Bob IT 85000 83 Diana IT 90000 104 Evan Finance 95000 12IT employees with more than 5 years of experience:Name Department Salary Experience1 Bob IT 85000 83 Diana IT 90000 10Sales employees with salary less than 70000 or experience less than 4:Name Department Salary Experience5 Fiona Sales 62000 2DataFrame after giving 10% raise to highly experienced employees:Name Department Salary Experience0 Alice Sales 72000 51 Bob IT 85000 82 Charlie Marketing 65000 33 Diana IT 90000 104 Evan Finance 104500 125 Fiona Sales 62000 2
This example illustrates how .loc can be used with boolean indexing to filter data based on various conditions, as well as how to update values based on conditions. These operations are fundamental for data cleaning, exploratory data analysis, and feature engineering.
Codebyte Example: Working with Date Ranges and Missing Values
This example demonstrates how to use .loc with date indices and handle missing values, which is common in time series analysis and real-world datasets.
This example demonstrates advanced usage of .loc for time series data, handling missing values, and performing complex filtering operations based on multiple conditions. These techniques are especially valuable for financial analysis, sensor data processing, and other time-dependent data applications.
Frequently Asked Questions
1. How are .iloc() and .loc() different?
.loc[]is label-based indexing that uses row and column names..iloc[]is integer-based indexing that uses positions (0, 1, 2, etc.).
Example:
df = pd.DataFrame({'A': [1, 2, 3]}, index=['X', 'Y', 'Z'])df.loc['Y'] # Returns value at row labeled 'Y'df.iloc[1] # Returns value at second row position
.loc[] includes both endpoints in slices while .iloc[] excludes the end position.
2. How to display first 3 rows in pandas?
a. Using .loc[] (with default index):
df.loc[0:2] # Includes rows 0, 1, and 2
b. Using .iloc[] (preferred for position-based selection):
df.iloc[0:3] # Includes rows 0, 1, and 2
c. Using .head() (most common approach):
df.head(3) # Shows first 3 rows
Is .iloc faster than .loc?
Yes, .iloc[] is generally faster than .loc[] because:
.iloc[]uses direct integer indexing.loc[]requires label matching and lookup
The performance difference matters mainly with:
- Very large
DataFrames(millions of rows) - Repeated indexing operations in loops
- Performance-critical applications
For most analysis tasks, choose based on readability and correctness rather than speed.
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python:Pandas on Codecademy
- Machine Learning Data Scientists solve problems at scale, make predictions, find patterns, and more! They use Python, SQL, and algorithms.
- Includes 27 Courses
- With Professional Certification
- Beginner Friendly.95 hours
- Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.
- With Certificate
- Beginner Friendly.24 hours