Great, we have a very condensed bit of code that does all our data cleaning, preprocessing, and modeling in a reusable fashion! What now? Well, we can tune some of the parameters of the model by apply a grid search over a range of hyperparameter values.
A linear regression model has very few hyperparameters, really just whether we include in intercept. But we will use this as an example to see the process for a pipeline. The pipeline created in the previous exercise is, itself, an estimator – you can call
.predict on it. So in fact, the pipeline can be passed as an estimator for
GridSearchCV. This will then refit the pipeline for each combination of parameter values in the grid and each fold in the cross-validation split.
That’s a lot – but the code is again very short. One thing to keep in mind, to reference hyperparameters in a pipeline, the values are reference by the pipeline step name + ‘‘ + hyperparameter. So `regrfit_intercept` references the named pipeline step “regr” and the hyperparameter “fit_intercept”.
Use the previous built
pipeline as input to
cv=5 and fit on the training data.
Print the best score obtained from the cross-validated grid search.