yanlei
yanlei
Hi there, Fantastic work! I wander how to calculate the MAUVE、diversity、coherence and gen-ppl in table 1, seems the code does not contain these relative method. Thank you in advice.
How are the acc, precision, recall, and other indicators of claim level calculated? Because the claims extracted from the model are definitely different? It is challenging to calculate indicators without...
would you please upload the react_hotpot_google.json file or show us some examples?  when I ran the code, the error happened, I could not figure out if the format were...
表7中long denpendency qa中的子任务上的分值都小于50,但是最后报出来的整体分数却达到54.09,同样都是gpt4进行打分,为什么会对不齐?