The pandas DataFrame method .info()
displays a table of information for each column.
parks.info()
# | Column | Non-Null Count | Dtype |
---|---|---|---|
0 | index | 72 non-null | int64 |
1 | Park | 72 non-null | object |
2 | Location | 72 non-null | object |
3 | AnnualPassPrice | 72 non-null | int64 |
#
indicates the column index numberColumn
refers to the column nameNon-Null Count
is the number of non-missing values in the columnDType
is the column’s data typeparks.info()
The pandas .drop()
method is used to remove irrelevant columns from a DataFrame. This method has two keywords:
labels
takes a list of column names to dropaxis=1
tells pandas we want to drop columns (not rows)index | Park | Location | |
---|---|---|---|
0 | 1 | Great Smoky Mountains | Gatlinburg, TN |
1 | 2 | Zion | Springdale, UT |
2 | 3 | Yellowstone | Jackson, WY |
The code snippet drops the index
column to produce
Park | Location | |
---|---|---|
0 | Great Smoky Mountains | Gatlinburg, TN |
1 | Zion | Springdale, UT |
2 | Yellowstone | Jackson, WY |
# Drop the index columndrop_columns = ['index']nationalparks.drop(labels=drop_columns, axis=1)
index | Park | Year2019 | |
---|---|---|---|
0 | 1 | Great Smoky Mountains | 12547743 |
1 | 2 | Zion | 4488268 |
2 | 3 | Yellowstone | 4020288 |
The pandas .rename()
method renames columns in a DataFrame. There are two particularly important keywords for .rename()
:
mapper
takes a dictionary mapping the old column names (as keys) to the new column names (as values)axis=1
tells pandas to rename the columns axis# Rename the Park column to National Parkcolumn_mapper = {'Park': 'National Park'}parks.rename(mapper=column_mapper, axis=1)
Python has built-in arithmetic operators for performing calculations, including
+
), -
), *
)/
)Like mathematics, Python uses parentheses to control the order of operations in a calculation.
100 + 10# Output: 110100 - 10# Output: 90100 * 10# Output: 1000100 / 10# Output: 10(100 + 10) / (10)# Output: 11.0
The round()
function in Python rounds a number to a certain number of decimals using the following syntax:
round(numeric_variable, number_of_decimals)
pi = 3.14159# Round pi to 4 decimalsround(pi, 4)# Output: 3.1416
In pandas, arithmetic operators like +
, -
, /
, and *
can be applied to all the rows of a column at once.
Here’s a sample DataFrame parks
.
Park | Area_SqMi | |
---|---|---|
0 | Great Smoky Mountains | 816.3 |
1 | Zion | 229.1 |
2 | Yellowstone | 3468.4 |
The code snippet produces the following DataFrame:
Park | Area_SqMi | Area_SqKm | |
---|---|---|---|
0 | Great Smoky Mountains | 816.3 | 2114.217 |
1 | Zion | 229.1 | 593.369 |
2 | Yellowstone | 3468.4 | 8983.156 |
# convert miles to km using column multiplicationparks['Area_SqKm'] = parks['Area_SqMi'] * 2.59
The pandas method .str.split(pat='x', expand=True)
will split the information in a text column into multiple columns using 'x'
as a delimiter. Common delimiters include commas (,
), colons (:
), and dashes (-
).
Location | |
---|---|
0 | Gatlinburg, TN |
1 | Springdale, UT |
2 | Jackson, WY |
The keyword argument expand=True
creates a DataFrame containing the split information that can be accessed through pandas indexing.
0 | 1 | |
---|---|---|
0 | Gatlinburg | TN |
1 | Springdale | UT |
2 | Jackson | WY |
# Split the Location column on the comma delimiterparks['Location'].str.split(pat=',', expand=True)
The Series method .str.cat()
combines text from two columns into a single string:
df['Combined'] = df['Column1'].str.cat(df['Column2'],sep=',')
.cat()
places the text in Column2
after the text in Column1
sep=','
places a comma ','
after the text from Column1
and before the text from Column2
City | State | |
---|---|---|
0 | Gatlinburg | TN |
1 | Springdale | UT |
2 | Jackson | WY |
The code snippet produces the following Location
column:
Location | |
---|---|
0 | Gatlinburg, TN |
1 | Springdale, UT |
2 | Jackson, WY |
# Combine the `City` and `State` columns into a single column `Location`parks['Location'] = parks['City'].str.cat(parks['State'],sep=', ')
Pandas can alter text case using
.str.lower()
- converts all text to lowercase.str.upper()
- converts all text to uppercase.str.title()
- converts all text to titlesPark | |
---|---|
0 | Great Smoky Mountains |
1 | Zion |
2 | Yellowstone |
Convert Park
to lowercase and uppercase
Park | .str.lower() |
.str.upper() |
|
---|---|---|---|
0 | Great Smoky Mountains | great smoky mountains | GREAT SMOKY MOUNTAINS |
1 | Zion | zion | ZION |
2 | Yellowstone | yellowstone | YELLOWSTONE |
# Convert to lowercaseparks['Park'].str.lower()# Convert to uppercaseparks['Park'].str.upper()
Before | After | |
---|---|---|
0 | Great.Smoky.Mountains | Great Smoky Mountains |
3 | Grand.Canyon | Grand Canyon |
4 | Rocky.Mountain | Rocky Mountain |
The pandas method .str.replace()
performs a find-and-replace on each row of a series. Every section of text that matches the string passed to pat
will be replaced by the string passed to repl
.
df['Column'] = df['Column'].str.replace(pat='old_pattern',repl='new_pattern',regex=False)
# Replace periods '.' with spacesparks['Park'] = parks['Park'].str.replace(pat='.',repl=' ',regex=False)
Missing or null
values in a pandas DataFrame are often represented with a NaN
value.
Park | Location | AnnualPassPrice | |
---|---|---|---|
0 | Great Smoky Mountains | Gatlinburg, TN | 40.0 |
1 | Zion | NaN | 70.0 |
2 | Yellowstone | Jackson, WY | NaN |
Location
valueAnnualPassPrice
valueThe pandas method .astype()
converts the type of a column from one type to another. The new type is specified within the parentheses:
float64
for decimalsint64
for integersobject
for text/objectscategory
for categorical dataPark | Area | |
---|---|---|
0 | Great Smoky Mountains | ‘816.3’ |
1 | Zion | ‘229.1’ |
2 | Yellowstone | ‘3468.4’ |
# Convert `Area` from object to floatparks['Area'] = parks['Area'].astype('float64')