Codecademy Logo

Data Manipulation with Pandas

Print Cheatsheet

Pandas DataFrame creation

The fundamental Pandas object is called a DataFrame. It is a 2-dimensional size-mutable, potentially heterogeneous, tabular data structure.

A DataFrame can be created multiple ways. It can be created by passing in a dictionary or a list of lists to the pd.DataFrame() method, or by reading data from a CSV file.

# Ways of creating a Pandas DataFrame
# Passing in a dictionary:
data = {'name':['Anthony', 'Maria'], 'age':[30, 28]}
df = pd.DataFrame(data)
# Passing in a list of lists:
data = [['Tom', 20], ['Jack', 30], ['Meera', 25]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# Reading data from a csv file:
df = pd.read_csv('students.csv')


Pandas is an open source library that is used to analyze data in Python. It takes in data, like a CSV or SQL database, and creates an object with rows and columns called a data frame. Pandas is typically imported with the alias pd.

import pandas as pd

Selecting Pandas DataFrame rows using logical operators

In pandas, specific rows can be selected if they satisfy certain conditions using Python’s logical operators. The result is a DataFrame that is a subset of the original DataFrame.

Multiple logical conditions can be combined with OR (using |) and AND (using &), and each condition must be enclosed in parentheses.

# Selecting rows where age is over 20
df[df.age > 20]
# Selecting rows where name is not John
df[ != "John"]
# Selecting rows where age is less than 10
# OR greater than 70
df[(df.age < 10) | (df.age > 70)]

Pandas apply() function

The Pandas apply() function can be used to apply a function on every value in a column or row of a DataFrame, and transform that column or row to the resulting values.

By default, it will apply a function to all values of a column. To perform it on a row instead, you can specify the argument axis=1 in the apply() function call.

# This function doubles the input value
def double(x):
return 2*x
# Apply this function to double every value in a specified column
df.column1 = df.column1.apply(double)
# Lambda functions can also be supplied to `apply()`
df.column2 = df.column2.apply(lambda x : 3*x)
# Applying to a row requires it to be called on the entire DataFrame
df['newColumn'] = df.apply(lambda row:
row['column1'] * 1.5 + row['column2'],

Pandas DataFrames adding columns

Pandas DataFrames allow for the addition of columns after the DataFrame has already been created, by using the format df['newColumn'] and setting it equal to the new column’s value.

# Specifying each value in the new column:
df['newColumn'] = [1, 2, 3, 4]
# Setting each row in the new column to the same value:
df['newColumn'] = 1
# Creating a new column by doing a
# calculation on an existing column:
df['newColumn'] = df['oldColumn'] * 5