Often times, you may not want to simply apply every function to all columns. If our columns are of different types, we may only want to apply certain parts of the pipeline to a subset of columns. This is what we saw in the two previous exercises. One set of transformations are applied to numeric columns and another set to the categorical ones. We can use ColumnTransformer
as one way of combining these processes together.
ColumnTransformer
takes in a list of tuples of the form (name, transformer, columns)
. The transformer can be anything with a .fit
and .transform
method like we used previously (like SimpleImputer
or StandardScaler
), but can also itself be a pipeline, as we will use in the exercise.
Instructions
Create a pipeline for the numerical preprocessing and a separate pipeline for the categorical preprocessing (see previous two exercises), called num_vals
and cat_vals
.
Create a ColumnTransformer
named preprocess
that takes the previous two pipelines and passes the numeric and categorical variables to each, respectively.
Fit the transformer on the training set and transform the test data.