FLASK issues

Key mismatch in gpt4_eval.py

3

Hi, thank you for great work on LLM evaluation. I'm very impressed by your work, because there are lack of evaluation metrics for ChatBot. I want to use this framework...

superdocker

updated spaces for numbered steps in the evaluation portion of the readme

1

ckgresla

Evaluation Code with Prometheus

I am impressed by this FLASK project and your follow-up work, [Prometheus](https://github.com/kaistAI/prometheus/). In the Prometheus paper, I saw experiments conducted on FLASK. Can you release your code for evaluating with...

Haoxiang-Wang

Regarding instance-specific scoring criteria

Hi, In the paper it is mentioned that "instance-specific" scoring criteria was created for the FLASK-HARD subset. Is there any way to create or use the subquestions/scoring criteria . It...

prapti19

Migration to newer OpenAI client (1.0+)

openai_concurrent.py still calls openai.ChatCompletion.create(...) which only existed in openai < v1.0 This is outdated now and causes all evals to go into the error output file. Need to migrate codebase...

Asoingbob225

FLASK
FLASK copied to clipboard

Metadata

Key mismatch in gpt4_eval.py

updated spaces for numbered steps in the evaluation portion of the readme

Evaluation Code with Prometheus

Regarding instance-specific scoring criteria

Migration to newer OpenAI client (1.0+)

← Metadata

Owner

Metadata

FLASK FLASK copied to clipboard

Metadata

Key mismatch in gpt4_eval.py

updated spaces for numbered steps in the evaluation portion of the readme

Evaluation Code with Prometheus

Regarding instance-specific scoring criteria

Migration to newer OpenAI client (1.0+)

← Metadata

Owner

Metadata

FLASK
FLASK copied to clipboard