Daniel Deutsch comments

Results 16 comments of


                                            Daniel Deutsch

Implement a ROUGE metric that faithfully reproduces the official metric written in perl.

@niansong1996 Check out the changes I made in #5207 and #5208. I was able to match the BART paper's scores using [these hyperparameters](https://github.com/pytorch/fairseq/blob/425c36eafff535fe7337f8bdd5ace22ebacc78cb/examples/bart/summarize.py#L10-L11) for min/max sequence length and the length...

Implement a ROUGE metric that faithfully reproduces the official metric written in perl.

Also to reproduce the ROUGE-L result, you actually need to sentence split the summaries before running ROUGE.

Implement a ROUGE metric that faithfully reproduces the official metric written in perl.

The AllenNLP ROUGE is fine for training, but you should really use the original package for calculating the final scores on the test set. I think the AllenNLP version operates...

Implement a ROUGE metric that faithfully reproduces the official metric written in perl.

This one is popular https://github.com/google-research/google-research/tree/master/rouge. It's included in this pip package https://pypi.org/project/rouge-score/ and used by Huggingface https://github.com/huggingface/datasets/blob/67574a8d74796bc065a8b9b49ec02f7b1200c172/metrics/rouge/rouge.py

Implement a ROUGE metric that faithfully reproduces the official metric written in perl.

Yes, you need to use the length penalty of 2.0 via the [LengthNormalizedSequenceLogProbabilityScorer](https://github.com/allenai/allennlp/blob/39d7e5ae06551fe371d3e16f4d93162e55ec5dcc/allennlp/nn/beam_search.py#L491-L509). I think that should make the difference. I have never used the Google ROUGE package myself, but...

Add instance IDs to model outputs

I think the selection of the ID would need to be dataset-specific. There isn't really an "official" ordering of the CNN/DailyMail dataset as far as I am aware. Each instance...

Add instance IDs to model outputs

I think using the `datasets` IDs when available would be helpful since many people use their dataset readers now. For example, [here](https://github.com/huggingface/datasets/tree/master/datasets/cnn_dailymail) they have unique IDs for the CNN/DailyMail dataset...

Daniel Deutsch

Implement a ROUGE metric that faithfully reproduces the official metric written in perl.

Implement a ROUGE metric that faithfully reproduces the official metric written in perl.

Implement a ROUGE metric that faithfully reproduces the official metric written in perl.

Implement a ROUGE metric that faithfully reproduces the official metric written in perl.

Implement a ROUGE metric that faithfully reproduces the official metric written in perl.

Add instance IDs to model outputs

Add instance IDs to model outputs

Eval4NLP 2023 Ingestion

Eval4NLP 2023 Ingestion

Rouge-L in python interface?