Added diskcache to base model.
Some models are very expensive to run inference on (e.g., Llama-3.3-70B). When we need to rerun inference to add a new metric for example, it would be very time consuming and expensive, especially since at least 4 80GB GPUs are necessary for inference.
We might want to add a flag to enable/disable caching. Also, we might want it for the other methods like loglikelihood generation too.
Thanks, I don't know when I have the capacity to add it to the other methods.
This might not be necessary anymore with PR #488.
Want us to close this one?
I personally think it would still be nice to have caching here too, but for me it is not strictly necessary anymore I guess.
To make local inference of large models more robust it would still be useful.