Questions about cal_knowledge_quadrants

Open BUGLI27 opened this issue 1 year ago • 1 comments

answer_correct = False
        if 'answer' in item and item['answer'] == 'This question is beyond the scope of my knowledge, and I am not sure what the answer is.':
            y_true.append(1)
        else:
            y_true.append(0)

There is no 'answer' attribute in item, which results in the questions all being marked as not known(The y_true will be 0 forever). It's really weird. In my opinion, IDK threshold should be used to judge whether the model knows the answer.

if 'This question is beyond the scope of my knowledge, and I am not sure what the answer is.' in item['generated_answer']:
            y_pred.append(1)
        else:
            y_pred.append(0)

Besides, when cal know quads of Prompt method, isn't the judge rule too strict? It's nearly IMPOSSIBLE for Prompt method to output 'This question is beyond the scope of my knowledge, and I am not sure what the answer is.' because the prompt is "Answer the following question, and if you don't know the answer, only reply with 'I don't know' <Question>". As a result, the y_pred will be 0 forever.

if y_true[-1] == 1: # marked as I dont know
            if y_pred[-1] == 1: # refuse to answer
                sample_disribution['Known Unknowns'] += 1
            else:
                if answer_correct: # give a correct answer
                    sample_disribution['Known Knowns'] += 1
                else: # give a wrong answer
                    sample_disribution['Unknown Unknowns'] += 1
        else: # marked as I know
            if y_pred[-1] == 1: # refuse to answer
                sample_disribution['Unknown Knowns'] += 1
            else:
                if answer_correct: # give a correct answer
                    sample_disribution['Known Knowns'] += 1
                else: # give a wrong a answer
                    sample_disribution['Unknown Unknowns'] += 1

The cal code that really works is the following:

 if answer_correct: # give a correct answer
    sample_disribution['Known Knowns'] += 1
else: # give a wrong a answer
    sample_disribution['Unknown Unknowns'] += 1

As a result, the know quad will only contain IK-IK and IDK-IDK.

Nov 08 '24 03:11 BUGLI27

Yeah, the idk template is too strict for Idk-prompting. We directly whether "I don't know" in the response instead of the whole idk template for Idk-prompting.

Dec 13 '24 03:12 xiami2019