Weizhe Yuan comments

Results 10 comments of


                                            Weizhe Yuan

Spearman Corrleations for Table-4

On the SummEval dataset, for FLU, COH and INFO, we also used BARTScore(s->h).

Spearman Corrleations for Table-4

Here are some rules we have followed when deciding which BARTScore variant to use. * based on the definition of the evaluation perspective (for example, factuality must rely on the...

Your demo site is down. Any plan to bring it back? Thanks

We have restarted the demo.

there is no code of bart_score_cnn_src_hypo?

You can customize your inputs to the scorer, for example, ``` bart_scorer.score(['my src sentence'], ['my hpy sentence']) ```

logical behind score() in bart_score.py

I did reproduce this with the model trained on parabank2, altho the numbers are slightly different to yours. When changing "I have 2 siblings" to "I have two siblings", the...

Which prompt was used in the paper's result?

The results shown in our paper is the avg of all prompts (see section 4.2.2 Settings -- Selection of Prompts for more details). Specifically, we get the score for one...

Which prompt was used in the paper's result?

I'm not exactly sure, since it's been a long time. You can check the numbers with `analysis.ipynb` to see which one matches the results. As I recall, we may have...

`bart_score.pth` is not related with importing the library. It is just our trained model. If you don't specify the checkpoint to be `bart_score.pth`, the default setting will download the `facebook/bart-large-cnn`...

Using BARTScore to Compare 2 summaries without Human Evaluation

Yes, similar to Rouge score, the higher the BARTScore, the better the summary.

Weizhe Yuan

Spearman Corrleations for Table-4

Spearman Corrleations for Table-4

Your demo site is down. Any plan to bring it back? Thanks

Python details

there is no code of bart_score_cnn_src_hypo?

logical behind score() in bart_score.py

Which prompt was used in the paper's result?

Which prompt was used in the paper's result?

Installation issue

Using BARTScore to Compare 2 summaries without Human Evaluation