FLASK
FLASK copied to clipboard
[ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Hi, thank you for great work on LLM evaluation. I'm very impressed by your work, because there are lack of evaluation metrics for ChatBot. I want to use this framework...
I am impressed by this FLASK project and your follow-up work, [Prometheus](https://github.com/kaistAI/prometheus/). In the Prometheus paper, I saw experiments conducted on FLASK. Can you release your code for evaluating with...
Hi, In the paper it is mentioned that "instance-specific" scoring criteria was created for the FLASK-HARD subset. Is there any way to create or use the subquestions/scoring criteria . It...
openai_concurrent.py still calls openai.ChatCompletion.create(...) which only existed in openai < v1.0 This is outdated now and causes all evals to go into the error output file. Need to migrate codebase...