zj
zj
hi @hongyi-zhao .通过名字,我知道您是中国人,我也是。我有一些问题想询问一下,如你上述,除了微信和qq外,你可以安装其它app吗?比如抖音、美团、小红书等。谢谢。
I can delete
The main reason for this issue is that the Linux system we usually use does not support a display. When the browser is launched by a background program, the command...
GPT and Claude are scoring models, not evaluation models. The evaluation data and methods (scoring methods) published in this code are as follows. To evaluate LLaMA-3, the following steps are...
model_inference.py 是要评估的模型 llm_eval.py 是将评估模型结果经过处理喂给gpt4 eval: 规则是给【评估模型结果】出值 gpt4结果那边也会出一个值 最后融合,计算hrs、ssr、csl 代码写的重复度太高了,😂,读起来太费劲儿。
评测方式:rule base (规则) + llm base(评估模型(gpt)) 数据子集:它给的六类:content、example、mix etc。 指标:HSR、SSR、CLS example这个数据子集跑了(HSR, CLS)指标,特殊的,直接用了待评估模型数据跑的。```def evaluate_example_constraint``` and ```def csl_evaluation``` 其他的五个:调用gpt跑了评估结果,拼了prompt。```def discriminative_evaluation``` and ```def rule_evaluation```这两跑完出了个结果,HSR、SSR数组取的值不一样。```discriminative_result[0] --> hsr;discriminative_result[1] --> ssr```. CLS: ```def csl_evaluation``` rule base是***评待评估模型(lam3 etc)***...