FLASK icon indicating copy to clipboard operation
FLASK copied to clipboard

Key mismatch in gpt4_eval.py

Open superdocker opened this issue 2 years ago • 3 comments

Hi, thank you for great work on LLM evaluation. I'm very impressed by your work, because there are lack of evaluation metrics for ChatBot. I want to use this framework to evaluate my model, but there are some issues while I'm following the steps introduced in README.

The main issue is the key error in gpt4_eval.py

In evaluation_set/flask_evaluation.jsonl, there are no keys in metrics/text/question_id, so it occurs error. I think they have been replaced from skill/instruction/idx, but I wonder if changing just these things will make it work without any problems.

Additionally, I'm wondering if it's possible to know how much it typically costs to evaluate the FLASK 1700 sample using the GPT-4 API? Since there was a significant cost for the 80 samples from MT-bench, having this information in advance would be greatly helpful.

If it's not too much trouble, a prompt response would be greatly appreciated. Thank you.

superdocker avatar Feb 10 '24 03:02 superdocker

same issue here

LorrinWWW avatar May 16 '24 21:05 LorrinWWW

@LorrinWWW In my case, I fixed some mismatches in gpt4_eval.py, and everything was fine. Below list includes what I modified.

  • metric_list = item["skill"]
  • prompt = prompt_template.format(question=item["instruction"], response=response, skills=skills, num=3, sample_answer=item["answer"], **defaults)
  • sorted_objects = sorted(json_objects, key=lambda obj: obj.get('idx'))

There might be other minor tweaks needed, but you should be able to handle them without much difficulty.

superdocker avatar May 17 '24 06:05 superdocker

@LorrinWWW In my case, I fixed some mismatches in gpt4_eval.py, and everything was fine. Below list includes what I modified.

  • metric_list = item["skill"]
  • prompt = prompt_template.format(question=item["instruction"], response=response, skills=skills, num=3, sample_answer=item["answer"], **defaults)
  • sorted_objects = sorted(json_objects, key=lambda obj: obj.get('idx'))

There might be other minor tweaks needed, but you should be able to handle them without much difficulty.

Thank you so much!

LorrinWWW avatar May 17 '24 17:05 LorrinWWW