evaluate icon indicating copy to clipboard operation
evaluate copied to clipboard

Cache results from `Evaluator`

Open lewtun opened this issue 3 years ago • 3 comments

When running an evaluation with the Evaluator class, it would be great to cache the results (e.g. store them as an Arrow dataset) so that one doesn't have to wait to recompute everything each time.

This would also give evaluate a similar developer experience to datasets and allow users to leverage the intuitions from one lib to the other :)

lewtun avatar Jun 06 '22 14:06 lewtun

Can you share code pointers for caching in datasets?

ola13 avatar Jun 07 '22 10:06 ola13

Can you share code pointers for caching in datasets?

Sure! You can check out the _map_single() method for Dataset objects: https://github.com/huggingface/datasets/blob/747f0c8612d4929dbdd1c72cca201815911ce660/src/datasets/arrow_dataset.py#L2561

Basically, it checks if the Dataset object has a cache_files object, and then returns a Dataset from file instead of re-computing the map function

lewtun avatar Jun 07 '22 15:06 lewtun

Cool I'd be happy to take a look in the next week or two

ola13 avatar Jun 08 '22 09:06 ola13