isds2020 icon indicating copy to clipboard operation
isds2020 copied to clipboard

pipeline .fit or .fit_transform

Open jacobwiberg opened this issue 5 years ago • 1 comments

Hi!

We want to normalize our data, as some of our covariates are on very different scales. When making a pipeline for the machine learning part of our assignment, we're discussing on whether to use pipeline.fit, or pipeline.fit_transform

Module 12 is not very clear or consistent about this. In the first 'Model pipelines'-video, .fit_transform is called after specifying a StandardScaler() in the pipeline. However for all remaining examples in the module we simply call pipeline.fit - Is the data still being transformed/scaled since the StandardScaler() is still specified in the pipeline? Or is the scaling step just there, while not being used?

jacobwiberg avatar Aug 26 '20 08:08 jacobwiberg

It depends on whether you have your supervised learning model in the pipeline or not.

  • If you do not have it in the pipe, then you need to use fit and transform on the training data, since you still need to train the supervised model afterwards.
  • If you have it in the pipe then you only need to use fit.

abjer avatar Aug 27 '20 14:08 abjer