badhrisuresh comments

Results 6 comments of


                                            badhrisuresh

GPT-J: evaluation.py is not deterministic

This issue is caused due to some randomness in rouge score code (in evaluate repo) and I fixed it by setting numpy random seed in the script. Please take a...

GPT-J: evaluation.py is not deterministic

Ideally, they should be deterministic as they are F-1 scores of different n-grams. I'm looking at an existing issue in their repo and will update once I test the actual...

GPT-J: evaluation.py is not deterministic

I found this issue [here](https://github.com/huggingface/evaluate/issues/186) that talks about the same problem. They enable the BootstrapAggregator by default in the code which does random sampling to compute confidence intervals which causes...

GPT-J: missing checkpoint and confusing README

Updated the README in the [PR](https://github.com/mlcommons/inference/pull/1386) - Modified the repo name and added reference model ROUGE scores.

GPT-J: missing checkpoint and confusing README

We are still working on publishing the fine-tuned model publicly. But we have already shared the checkpoint internally with the task force which you can try.

GPT-J: Dataset issues

We have always used validation set and not the test set for MLPerf Inference benchmarking. I have removed the redundant code from download_cnndm.py and also updated the max_examples in main.py