[FT] Adding caching for each dataset run

Open JoelNiklaus opened this issue 1 year ago • 2 comments

Issue encountered

When running large evals with many dataset configurations it is very painful to rerun everything in case something fails.

Solution/Feature

It would be great if intermediate results could be cached, for example the computed metrics of each dataset.

Dec 02 '24 22:12 JoelNiklaus

I am just looking out to contribute in HF repo. Just trying to understand code base and see if I can add this feature.

Dec 11 '24 11:12 punitvara

@punitvara We would want results to be saved after each batch run, and to be reused if an evaluation is launched with the exact same parameter configuration. TBH, it's not a trivial PR to work on - if you're unfamiliar with lighteval, I would suggest working on #324, #325, or maybe #355 to get to know the code base first

Dec 12 '24 11:12 clefourrier