DriveBench icon indicating copy to clipboard operation
DriveBench copied to clipboard

Corruption JSON files solely have question_type": "robust_qas"

Open SM20sam opened this issue 5 months ago • 5 comments

In https://huggingface.co/datasets/drive-bench/arena

the json file corresponding to corruptions only have the question_type: robust_qas

This is a serious issue as the eval.py expects these question types: self.results = { "perception": { "MCQ": {"gpt": [], "accuracy": []}, "VQA": {"gpt": [], "language": []} }, "prediction": { "VQA": {"gpt": [], "language": []} }, "planning": { "VQA": {"gpt": [], "language": []} }, "behavior": { "MCQ": {"gpt": [], "accuracy": []}} }

The json files for corruptions need to be updated.

Image Image

SM20sam avatar Aug 05 '25 00:08 SM20sam

There are two ways to use the corrupted images.

  • Corrupted images with original text input. Corresponding to all the other numbers reported in the paper, except Table 5.
  • Corrupted image with specific text input for each corruption (the screenshot here), corresponding to Table 5 in the paper.

The current evaluation script only supports the first one. We will provide the script to support the second one soon.

Daniel-xsy avatar Aug 05 '25 16:08 Daniel-xsy

For the 1st way, "Corrupted images with original text input. Corresponding to all the other numbers reported in the paper, except Table 5.". In drivebench-test.json, "image_path" point to the uncorrupted images, so do I need to modify drivebench-test.json manually to point to the corrupted images?

SM20sam avatar Aug 18 '25 03:08 SM20sam

Yes, since those questions are also used for clean answers evaluated in the paper, we use the original image path by default. The path change is also in our released script.

Daniel-xsy avatar Aug 18 '25 04:08 Daniel-xsy

By "The path change is also in our released script.", which script are you referring to? Could you give step-by-step instructions on how to run the 1st way, "Corrupted images with original text input. Corresponding to all the other numbers reported in the paper, except Table 5."

SM20sam avatar Aug 19 '25 02:08 SM20sam

Sorry for the confusion. You can refer to the code here. Please let me know if you have further questions!

Daniel-xsy avatar Aug 19 '25 02:08 Daniel-xsy