Now we know how to create and load data. Let’s select parts of those datasets that are interesting or important to our analyses.
Suppose you have the DataFrame called
customers, which contains the ages of your customers:
Perhaps you want to take the average or plot a histogram of the ages. In order to do either of these tasks, you’d need to select the column.
There are two possible syntaxes for selecting all values from a column:
- Select the column as if you were selecting a value from a dictionary using a key. In our example, we would type
customers['age']to select the ages.
- If the name of a column follows all of the rules for a variable name (doesn’t start with a number, doesn’t contain spaces or special characters, etc.), then you can select it using the following notation:
df.MySecondColumn. In our example, we would type
When we select a single column, the result is called a Series.
df represents data collected by four health clinics run by the same organization. Each row represents a month from January through June and shows the number of appointments made at four different clinics.
You want to analyze what’s been happening at the North location. Create a variable called
clinic_north that contains ONLY the data from the column
What exactly have you selected?
After you create the variable, enter the command:
to see what data type you’ve created.
How is this different from what you get if you type the following?