When you have a larger DataFrame, you might want to select just a few columns.

For instance, let’s return to a DataFrame of orders from ShoeFly.com:

id first_name last_name email shoe_type shoe_material shoe_color
54791 Rebecca Lindsay [email protected] clogs faux-leather black
53450 Emily Joyce [email protected] ballet flats faux-leather navy
91987 Joyce Waller [email protected] sandals fabric black
14437 Justin Erickson [email protected] clogs faux-leather red

We might just be interested in the customer’s last_name and email. We want a DataFrame like this:

last_name email
Lindsay [email protected]
Joyce [email protected]
Waller [email protected]
Erickson [email protected]

To select two or more columns from a DataFrame, we use a list of the column names. To create the DataFrame shown above, we would use:

new_df = orders[['last_name', 'email']]

Note: Make sure that you have a double set of brackets ([[]]), or this command won’t work!



Now, you want to compare visits to the Northern and Southern clinics.

Create a variable called clinic_north_south that contains ONLY the data from the columns clinic_north and clinic_south.


When we select multiple columns, do we get a Series or a DataFrame?

After you’ve created the variable, enter the command:


to see what data type you’ve created.

How is this different from what happened in the previous exercise?

