The pandas DataFrame method .info() displays a table of information for each column.
parks.info()
| # | Column | Non-Null Count | Dtype |
|---|---|---|---|
| 0 | index | 72 non-null | int64 |
| 1 | Park | 72 non-null | object |
| 2 | Location | 72 non-null | object |
| 3 | AnnualPassPrice | 72 non-null | int64 |
# indicates the column index numberColumn refers to the column nameNon-Null Count is the number of non-missing values in the columnDType is the column’s data typeparks.info()
The pandas .drop() method is used to remove irrelevant columns from a DataFrame. This method has two keywords:
labels takes a list of column names to dropaxis=1 tells pandas we want to drop columns (not rows)| index | Park | Location | |
|---|---|---|---|
| 0 | 1 | Great Smoky Mountains | Gatlinburg, TN |
| 1 | 2 | Zion | Springdale, UT |
| 2 | 3 | Yellowstone | Jackson, WY |
The code snippet drops the index column to produce
| Park | Location | |
|---|---|---|
| 0 | Great Smoky Mountains | Gatlinburg, TN |
| 1 | Zion | Springdale, UT |
| 2 | Yellowstone | Jackson, WY |
# Drop the index columndrop_columns = ['index']nationalparks.drop(labels=drop_columns, axis=1)
| index | Park | Year2019 | |
|---|---|---|---|
| 0 | 1 | Great Smoky Mountains | 12547743 |
| 1 | 2 | Zion | 4488268 |
| 2 | 3 | Yellowstone | 4020288 |
The pandas .rename() method renames columns in a DataFrame. There are two particularly important keywords for .rename():
mapper takes a dictionary mapping the old column names (as keys) to the new column names (as values)axis=1 tells pandas to rename the columns axis# Rename the Park column to National Parkcolumn_mapper = {'Park': 'National Park'}parks.rename(mapper=column_mapper, axis=1)
Python has built-in arithmetic operators for performing calculations, including
+), -), *)/)Like mathematics, Python uses parentheses to control the order of operations in a calculation.
100 + 10# Output: 110100 - 10# Output: 90100 * 10# Output: 1000100 / 10# Output: 10(100 + 10) / (10)# Output: 11.0
The round() function in Python rounds a number to a certain number of decimals using the following syntax:
round(numeric_variable, number_of_decimals)
pi = 3.14159# Round pi to 4 decimalsround(pi, 4)# Output: 3.1416
In pandas, arithmetic operators like +, -, /, and * can be applied to all the rows of a column at once.
Here’s a sample DataFrame parks.
| Park | Area_SqMi | |
|---|---|---|
| 0 | Great Smoky Mountains | 816.3 |
| 1 | Zion | 229.1 |
| 2 | Yellowstone | 3468.4 |
The code snippet produces the following DataFrame:
| Park | Area_SqMi | Area_SqKm | |
|---|---|---|---|
| 0 | Great Smoky Mountains | 816.3 | 2114.217 |
| 1 | Zion | 229.1 | 593.369 |
| 2 | Yellowstone | 3468.4 | 8983.156 |
# convert miles to km using column multiplicationparks['Area_SqKm'] = parks['Area_SqMi'] * 2.59
The pandas method .str.split(pat='x', expand=True) will split the information in a text column into multiple columns using 'x' as a delimiter. Common delimiters include commas (,), colons (:), and dashes (-).
| Location | |
|---|---|
| 0 | Gatlinburg, TN |
| 1 | Springdale, UT |
| 2 | Jackson, WY |
The keyword argument expand=True creates a DataFrame containing the split information that can be accessed through pandas indexing.
| 0 | 1 | |
|---|---|---|
| 0 | Gatlinburg | TN |
| 1 | Springdale | UT |
| 2 | Jackson | WY |
# Split the Location column on the comma delimiterparks['Location'].str.split(pat=',', expand=True)
The Series method .str.cat() combines text from two columns into a single string:
df['Combined'] = df['Column1'].str.cat(df['Column2'],sep=',')
.cat() places the text in Column2 after the text in Column1sep=',' places a comma ',' after the text from Column1 and before the text from Column2| City | State | |
|---|---|---|
| 0 | Gatlinburg | TN |
| 1 | Springdale | UT |
| 2 | Jackson | WY |
The code snippet produces the following Location column:
| Location | |
|---|---|
| 0 | Gatlinburg, TN |
| 1 | Springdale, UT |
| 2 | Jackson, WY |
# Combine the `City` and `State` columns into a single column `Location`parks['Location'] = parks['City'].str.cat(parks['State'],sep=', ')
Pandas can alter text case using
.str.lower() - converts all text to lowercase.str.upper() - converts all text to uppercase.str.title() - converts all text to titles| Park | |
|---|---|
| 0 | Great Smoky Mountains |
| 1 | Zion |
| 2 | Yellowstone |
Convert Park to lowercase and uppercase
| Park | .str.lower() |
.str.upper() |
|
|---|---|---|---|
| 0 | Great Smoky Mountains | great smoky mountains | GREAT SMOKY MOUNTAINS |
| 1 | Zion | zion | ZION |
| 2 | Yellowstone | yellowstone | YELLOWSTONE |
# Convert to lowercaseparks['Park'].str.lower()# Convert to uppercaseparks['Park'].str.upper()
| Before | After | |
|---|---|---|
| 0 | Great.Smoky.Mountains | Great Smoky Mountains |
| 3 | Grand.Canyon | Grand Canyon |
| 4 | Rocky.Mountain | Rocky Mountain |
The pandas method .str.replace() performs a find-and-replace on each row of a series. Every section of text that matches the string passed to pat will be replaced by the string passed to repl.
df['Column'] = df['Column'].str.replace(pat='old_pattern',repl='new_pattern',regex=False)
# Replace periods '.' with spacesparks['Park'] = parks['Park'].str.replace(pat='.',repl=' ',regex=False)
Missing or null values in a pandas DataFrame are often represented with a NaN value.
| Park | Location | AnnualPassPrice | |
|---|---|---|---|
| 0 | Great Smoky Mountains | Gatlinburg, TN | 40.0 |
| 1 | Zion | NaN | 70.0 |
| 2 | Yellowstone | Jackson, WY | NaN |
Location valueAnnualPassPrice valueThe pandas method .astype() converts the type of a column from one type to another. The new type is specified within the parentheses:
float64 for decimalsint64 for integersobject for text/objectscategory for categorical data| Park | Area | |
|---|---|---|
| 0 | Great Smoky Mountains | ‘816.3’ |
| 1 | Zion | ‘229.1’ |
| 2 | Yellowstone | ‘3468.4’ |
# Convert `Area` from object to floatparks['Area'] = parks['Area'].astype('float64')