pipeline .fit or .fit_transform

Open jacobwiberg opened this issue 5 years ago • 1 comments

Hi!

We want to normalize our data, as some of our covariates are on very different scales. When making a pipeline for the machine learning part of our assignment, we're discussing on whether to use pipeline.fit, or pipeline.fit_transform

Module 12 is not very clear or consistent about this. In the first 'Model pipelines'-video, .fit_transform is called after specifying a StandardScaler() in the pipeline. However for all remaining examples in the module we simply call pipeline.fit - Is the data still being transformed/scaled since the StandardScaler() is still specified in the pipeline? Or is the scaling step just there, while not being used?

Aug 26 '20 08:08 jacobwiberg

It depends on whether you have your supervised learning model in the pipeline or not.

If you do not have it in the pipe, then you need to use fit and transform on the training data, since you still need to train the supervised model afterwards.
If you have it in the pipe then you only need to use fit.

Aug 27 '20 14:08 abjer