Now we know how to create and load data. Let’s select parts of those datasets that are interesting or important to our analyses.
Suppose you have the DataFrame called customers
, which contains the ages of your customers:
name | age |
---|---|
Rebecca Erikson | 35 |
Thomas Roberson | 28 |
Diane Ochoa | 42 |
… | … |
Perhaps you want to take the average or plot a histogram of the ages. In order to do either of these tasks, you’d need to select the column.
There are two possible syntaxes for selecting all values from a column:
- Select the column as if you were selecting a value from a dictionary using a key. In our example, we would type
customers['age']
to select the ages. - If the name of a column follows all of the rules for a variable name (doesn’t start with a number, doesn’t contain spaces or special characters, etc.), then you can select it using the following notation:
df.MySecondColumn
. In our example, we would typecustomers.age
.
When we select a single column, the result is called a Series.
Instructions
The DataFrame df
represents data collected by four health clinics run by the same organization. Each row represents a month from January through June and shows the number of appointments made at four different clinics.
You want to analyze what’s been happening at the North location. Create a variable called clinic_north
that contains ONLY the data from the column clinic_north
.
What exactly have you selected?
After you create the variable, enter the command:
print(type(clinic_north))
to see what data type you’ve created.
How is this different from what you get if you type the following?
print(type(df))