[FT] Adding caching for each dataset run
Issue encountered
When running large evals with many dataset configurations it is very painful to rerun everything in case something fails.
Solution/Feature
It would be great if intermediate results could be cached, for example the computed metrics of each dataset.
I am just looking out to contribute in HF repo. Just trying to understand code base and see if I can add this feature.
@punitvara We would want results to be saved after each batch run, and to be reused if an evaluation is launched with the exact same parameter configuration. TBH, it's not a trivial PR to work on - if you're unfamiliar with lighteval, I would suggest working on #324, #325, or maybe #355 to get to know the code base first