Great, we have a very condensed bit of code that does all our data cleaning, preprocessing, and modeling in a reusable fashion! What now? Well, we can tune some of the parameters of the model by apply a grid search over a range of hyperparameter values.
A linear regression model has very few hyperparameters, really just whether we include in intercept. But we will use this as an example to see the process for a pipeline. The pipeline created in the previous exercise is, itself, an estimator – you can call .fit
and .predict
on it. So in fact, the pipeline can be passed as an estimator for GridSearchCV
. This will then refit the pipeline for each combination of parameter values in the grid and each fold in the cross-validation split.
That’s a lot – but the code is again very short. One thing to keep in mind, to reference hyperparameters in a pipeline, the values are reference by the pipeline step name + ‘‘ + hyperparameter. So `regrfit_intercept` references the named pipeline step “regr” and the hyperparameter “fit_intercept”.
Instructions
Use the previous built pipeline
as input to GridSearchCV
using scoring='neg_mean_squared_error'
and cv=5
and fit on the training data.
Print the best score obtained from the cross-validated grid search.