For the categorical variables, let’s look at another common task – dealing with missing values and one-hot-encoding. We will convert an existing codebase to a pipeline, describing the two steps in detail.

As in in the previous exercise, SimpleImputer will be used again to fill missing values in the pipeline, but this time, the strategy parameter will need to be updated to most_frequent. OneHotEncoder will be used as the second step in the pipeline. Note, that the default is that a sparse array will be returned from this transform, so we will use sparse='False' to return a full array.



Examine the existing code that fills in missing values with the mode value and then creates dummy variables (with OneHotEncoder). Update the pipeline with the correct two steps and fit on the training set (categorical columns).


Transform the test data (categorical columns only) using the fit pipeline. Confirm the results are the same as x_test_fill_missing_ohe by printing the sum of absolute differences.

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?