Learn

We often have data with missing elements, as a result of a problem with the data collection process or errors in the way the data was stored. The missing elements normally show up as `NaN` (or Not a Number) values:

day bill tip num_guests
“Mon” 10.1 1 1
“Mon” 20.75 5.5 2
“Tue” 19.95 5.5 NaN
“Wed” 44.10 15 3
“Wed” NaN 1 1

The `num_guests` value for the 3rd row is missing, and the `bill` value for the 5th row is missing. Some calculations we do will just skip the `NaN` values, but some calculations or visualizations we try to perform will break when a `NaN` is encountered.

Most of the time, we use one of two methods to deal with missing values.

#### Method 1: drop all of the rows with a missing value

We can use `.dropna()` to do this:

``bill_df = bill_df.dropna()``

This command will result in the DataFrame without the incomplete rows:

day bill tip num_guests
“Mon” 10.1 1 1
“Mon” 20.75 5.5 2
“Wed” 44.10 15 3

If we wanted to remove every row with a `NaN` value in the num_guests column only, we could specify a `subset`:

``bill_df = bill_df.dropna(subset=['num_guests'])``

#### Method 2: fill the missing values with the mean of the column, or with some other aggregate value.

We can use `.fillna()` to do this:

``bill_df = bill_df.fillna(value={"bill":bill_df.bill.mean(), "num_guests":bill_df.num_guests.mean()})``

This command will result in the DataFrame with the respective mean of the column in the place of the original `NaN`s:

day bill tip num_guests
“Mon” 10.1 1 1
“Mon” 20.75 5.5 2
“Tue” 19.95 5.5 1.75
“Wed” 44.10 15 3
“Wed” 23.725 1 1

### Instructions

1.

Get the mean of the score column. Store it in `score_mean` and print it out.

2.

We will assume that everyone who doesn’t have a score for an exam missed the test. We want to replace all `nan`s with a score of 0. Let’s do this with the `score` column.

Fill all of the `nan`s in `students['score']` with `0`.

3.

Get the mean of the `score` column again. Store it in `score_mean_2` and print it out.