"Rank classification" in evaluation for multiple choices

Open yuchenlin opened this issue 3 years ago • 1 comments

Hi,

Thanks for the repo! I was wondering if you would please point out which lines of code are for the "rank classification" idea used for evaluating the multiple-choice style tasks?

The paper describes it like this on Page 6:

For tasks that involve choosing the correct completion from several options (e.g. multiple choice question answering), we follow Brown et al. (2020) and use rank classification to evaluate our model: we compute the log-likelihood of each of the target options under the fine-tuned model and select the option with the highest log-likelihood as the prediction. For simplicity, we do not apply length normalization to the log-likelihoods of the target options.

Thank you!

Mar 06 '23 21:03 yuchenlin

Ah I think I found it here in the forward function of the customized EncoderDecoderModel class: https://github.com/bigscience-workshop/t-zero/blob/25c0761427f3894a8ec5a062a075b96037fb1492/t0/model.py#L56

However, I was wondering if you would please help give a short tutorial that how we can use the same idea to easily evaluate other LMs (say a fine-tuned BART) to make sure the comparisons are fair.

Mar 06 '23 21:03 yuchenlin