zj comments

Results 7 comments of

zj

Wechat/QQ support in docker-android.

hi @hongyi-zhao .通过名字，我知道您是中国人，我也是。我有一些问题想询问一下，如你上述，除了微信和qq外，你可以安装其它app吗？比如抖音、美团、小红书等。谢谢。

Cannot delete files in all shares

I can delete

Publishing Prompts for Multimodal Evals?

Task About GoogleDrive Hang up

The main reason for this issue is that the Linux system we usually use does not support a display. When the browser is launched by a background program, the command...

How do we run this code?

GPT and Claude are scoring models, not evaluation models. The evaluation data and methods (scoring methods) published in this code are as follows. To evaluate LLaMA-3, the following steps are...

some question

model_inference.py 是要评估的模型 llm_eval.py 是将评估模型结果经过处理喂给gpt4 eval：规则是给【评估模型结果】出值 gpt4结果那边也会出一个值最后融合，计算hrs、ssr、csl 代码写的重复度太高了，😂，读起来太费劲儿。

评测方式：rule base （规则） + llm base（评估模型（gpt））数据子集：它给的六类：content、example、mix etc。指标：HSR、SSR、CLS example这个数据子集跑了（HSR, CLS）指标，特殊的，直接用了待评估模型数据跑的。```def evaluate_example_constraint``` and ```def csl_evaluation``` 其他的五个：调用gpt跑了评估结果，拼了prompt。```def discriminative_evaluation``` and ```def rule_evaluation```这两跑完出了个结果，HSR、SSR数组取的值不一样。```discriminative_result[0] --> hsr;discriminative_result[1] --> ssr```. CLS: ```def csl_evaluation``` rule base是***评待评估模型(lam3 etc)***...