Seungone Kim

Results 5 issues of Seungone Kim

Dear authors, I have a question about the data preprocessing step(Heterogeneous Dialogue Graph Construction Process). So far, I have understood that the result of this step is in the '/data'...

Hello Nathan, Thank you for this valuable resource! I strongly think that we needed more standardized benchmarks to evaluate reward/evaluator models. I think submit_eval_jobs.py (using AI2's beaker) supports multi gpu...

enhancement

Hello, thanks for providing this awesome repository introducing different instruction datasets! Could you consider adding our CoT Collection dataset? It's a massive instruction dataset consisted of 1.84 million rationales across...

@haileyschoelkopf @lintangsutawika @baberabb The following is a list of TODOs to implement LLM-as-a-Judge in Eval-Harness: **TLDR** * Splits existing `evaluate` function into `classification_evaluate` and `generation_evaluate`. * Enables the user decide...

We're planning to add VLM-as-a-Judge functionality to prometheus-eval. * References: [https://arxiv.org/abs/2401.06591](https://arxiv.org/abs/2401.06591) The high-level idea is that using gpt-4v or gpt-4o, the judge VLM would receive an image as input in...