Using data collator in `Pipeline`

Open neilkimn opened this issue 2 years ago • 1 comments

Hello, I am in the process of moving a bunch of pre- and post-processing logic to use the Pipeline. In my original code I would use a data collator in my Trainer constructor to take care of padding inputs among other things. The Trainer then takes care of collating data for both training and evaluation.

I could move the logic within the collator into the processing of the pipeline, but I want to keep the code as similar as possible when using the Trainer for training specifically, and when I use the pipeline during inference or evaluation.

What could be the best way to go about this? In the more general case I could just scrap the pipeline and opt for a torch dataloader and run evaluation with that, but I am interested in keeping the pipeline around as I am inheriting some logic for aggregation around. I also think the ability to encapsulate pre- and post-processing in the pipeline is useful.

Apr 25 '23 17:04 neilkimn

I'm not too sure where the question is here. Each Pipeline has the pre/post-processing logic they need implemented in their preprocess and postprocess methods.

Apr 25 '23 18:04 sgugger