Often times, you may not want to simply apply every function to all columns. If our columns are of different types, we may only want to apply certain parts of the pipeline to a subset of columns. This is what we saw in the two previous exercises. One set of transformations are applied to numeric columns and another set to the categorical ones. We can use
ColumnTransformer as one way of combining these processes together.
ColumnTransformer takes in a list of tuples of the form
(name, transformer, columns). The transformer can be anything with a
.transform method like we used previously (like
StandardScaler), but can also itself be a pipeline, as we will use in the exercise.
Create a pipeline for the numerical preprocessing and a separate pipeline for the categorical preprocessing (see previous two exercises), called
preprocess that takes the previous two pipelines and passes the numeric and categorical variables to each, respectively.
Fit the transformer on the training set and transform the test data.