While scikit-learn contains many existing transformers and classes that can be used in pipelines, you may need at some point to create your own. This is simpler than you may think, as a step in the pipeline needs to have only a few methods implemented. If it is an intermediate step, it will need fit and transform methods, which we will demonstrate in the exercise below.

Here are some of the major takeaways on pipeline:

Pipelines help make concise, reproducible, code by combining steps of transformers and/or a final estimator.

Intermediate steps of a pipeline must have both the

`.fit()`

and`.transform()`

methods. This includes preprocessing, imputation, feature selection, dimension reduction.The final step of a pipeline must have the

`.fit()`

method – this can include a transformer or an estimator/model.If the pipeline is meant to only transform your data by combining preprocessing and data cleaning steps, then each step in the pipeline will be a transformer. If your pipeline will also include a model (a final estimation or prediction step), then the last step must be an estimator.

Once the steps of a pipeline are defined, it can be used like an other transformer/estimator by calling fit, transform, and/or predict methods. Similarly, it can be used in place of an estimator in a hyperparameter grid search.

### Instructions

**1.**

Examine the code written for the class `MyImputer`

. This replicates the `SimpleImputer`

using the mean strategy. Notice both fit and transform methods are defined. Use this new class as the first step in `new_pipeline`

and second step `StandardScaler`

.

**2.**

Fit the new pipeline on the training data, numeric columns only. This will be identical to the pipeline created in exercise 2. Verify this by performing the following steps:

- Transform the test set
`x_test[num_cols]`

using this and write it to a new variable`x_transform`

. - Calculate the absolute difference between the arrays
`x_transform`

and`x_test_fill_missing_scale`

and*sum*the resulting array. Set this number to a variable`array_diff`

and print it.