Maram Hasanain
Maram Hasanain
Add tests for the new model VLLM.
This is to continue a previous branch with span level propaganda detection
As part of the output of a benchmarking experiment, we should write the full configuration used to file. This can be useful to version experiments (for reproducibility). Example configs to...
We have a recurrent format of some datasets where the same dataset will have multiple splits under each, where splits are different by language, subtask, train-dev-test, etc. but have the...
Allow framework to accept list of assets (or list of wildcard asset queries) to run instead of a single one.
Keep track of inference time per input sample, maybe just for the successful cases (we can add it to the cache and update it for every new input sample). After...