AllenShow

Results 12 issues of AllenShow

Hi! Thanks for your great work. I tried to reproduce the baseline tasks, but the results were low compared to the paper. So I am not sure whether I used...

Hi! Thanks for the great work. When reproducing the inference for PopQA using Self-RAG, I got the same score for adaptive_retrieval and always_retrieve. In theory, the adaptive_retrieval result should be...

Hello, I'm trying to reproduce paper numbers on PopQA by running the following command : Question Answering python run_short_form.py \ --model_name selfrag/selfrag_llama2_7b \ --input_file eval_data/popqa_longtail_w_gs.jsonl \ --mode MODE --max_new_tokens 100...

Hi! Thanks for your great work. I try to reproduce the baseline for ASQA using Llama-2-7b-hf, like this: python run_baseline_lm.py \ --model_name meta-llama/Llama-2-7b-hf \ --input_file eval_data/asqa_eval_gtr_top100.json \ --max_new_tokens 300 --metric...

Hi! Thanks for the great work. I find an error in run_short_form.py when testing with my own data about multiple choices ``` if len(results) == 1: postprocessed_pred = postprocess_answer_option_conditioned(pred) return...

Hi! Thanks for your great work. I find the task parameter in all example codes(run_baseline_lm.py) is 'qa', but in self-RAG, the task could be different values, such as arc_c, fever,...

Hello, thanks for providing this amazing tool. Could mergoo support QWEN models?

help wanted

Thanks for providing the wonderful tool TRL. I have a question. When packing=True, eval_packing=False, training failed with KeyError: 'eval_loss' . However, when I removed eval_packing=False( at this moment, eval_packing will...

### 先决条件 - [x] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。 - [x] 错误在 [最新版本](https://github.com/open-compass/opencompass) 中尚未被修复。 ### 问题类型 我正在使用官方支持的任务/模型/数据集进行评估。 ### 环境 {'CUDA available': True, 'CUDA_HOME': '/usr/local/cuda', 'GCC': 'gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0', 'GPU...