Weizhe Yuan
Weizhe Yuan
On the SummEval dataset, for FLU, COH and INFO, we also used BARTScore(s->h).
Here are some rules we have followed when deciding which BARTScore variant to use. * based on the definition of the evaluation perspective (for example, factuality must rely on the...
We have restarted the demo.
We used Python 3.7
You can customize your inputs to the scorer, for example, ``` bart_scorer.score(['my src sentence'], ['my hpy sentence']) ```
I did reproduce this with the model trained on parabank2, altho the numbers are slightly different to yours. When changing "I have 2 siblings" to "I have two siblings", the...
The results shown in our paper is the avg of all prompts (see section 4.2.2 Settings -- Selection of Prompts for more details). Specifically, we get the score for one...
I'm not exactly sure, since it's been a long time. You can check the numbers with `analysis.ipynb` to see which one matches the results. As I recall, we may have...
`bart_score.pth` is not related with importing the library. It is just our trained model. If you don't specify the checkpoint to be `bart_score.pth`, the default setting will download the `facebook/bart-large-cnn`...
Yes, similar to Rouge score, the higher the BARTScore, the better the summary.