Seungone Kim issues

Results 5 issues of


                                            Seungone Kim

Question about Graph Construction Process

Dear authors, I have a question about the data preprocessing step(Heterogeneous Dialogue Graph Construction Process). So far, I have understood that the result of this step is in the '/data'...

multi gpu inference with run_rm.py

Hello Nathan, Thank you for this valuable resource! I strongly think that we needed more standardized benchmarks to evaluate reward/evaluator models. I think submit_eval_jobs.py (using AI2's beaker) supports multi gpu...

enhancement

Consider our CoT dataset (CoT Collection)

Hello, thanks for providing this awesome repository introducing different instruction datasets! Could you consider adding our CoT Collection dataset? It's a massive instruction dataset consisted of 1.84 million rationales across...

TODOs for Implementing LLM-as-a-Judge in Eval-Harness (Work in Progress)

@haileyschoelkopf @lintangsutawika @baberabb The following is a list of TODOs to implement LLM-as-a-Judge in Eval-Harness: **TLDR** * Splits existing `evaluate` function into `classification_evaluate` and `generation_evaluate`. * Enables the user decide...

Support of VLM-as-a-Judge

We're planning to add VLM-as-a-Judge functionality to prometheus-eval. * References: [https://arxiv.org/abs/2401.06591](https://arxiv.org/abs/2401.06591) The high-level idea is that using gpt-4v or gpt-4o, the judge VLM would receive an image as input in...