[Feature] 我需要对本地部署的Qwen-110B模型进行MMLU基准测试,请问该怎么操作呢?
Describe the feature
我已经将项目下载到本地,并且将数据集也下载到本地了 git clone https://github.com/open-compass/opencompass opencompass cd opencompass pip install -e .
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip unzip OpenCompassData-core-20240207.zip
我在opencompass 文件夹下配置了configs,其中configs里面创建了py文件,文件如下: from opencompass.models import OpenAI from opencompass.partitioners import NaivePartitioner from opencompass.runners import LocalRunner from opencompass.tasks import OpenICLInferTask from opencompass.datasets import MMLUDataset
配置本地Qwen模型API
model = OpenAI(
abbr='qwen-110b-chat',
path='qwen-110b-chat',
key='EMPTY',
query_per_second=1,
max_seq_len=4096,
api_base='http://localhost:8080/v1',
meta_template=None,
retry=10 # 增加重试次数
)
配置MMLU数据集
datasets = [ MMLUDataset( path='./data/OpenCompassData-core-20240207/mmlu', name='stem', split='test' ) ]
配置工作流
work_dir = 'outputs/qwen_mmlu'
infer = { 'partitioner': {'type': NaivePartitioner}, 'runner': { 'type': LocalRunner, 'max_workers': 2, }, 'task': {'type': OpenICLInferTask} }
请问我配置的正确吗?我该如何进行mmlu基准测试?
Will you implement it?
- [ ] I would like to implement this feature and create a PR!
I have same question, how to run a dataset evaluation on a local model instance.