ScienceQA icon indicating copy to clipboard operation
ScienceQA copied to clipboard

Confused QA about grey heron (Ardea cinerea)

Open palimondo opened this issue 2 years ago • 2 comments

Hi,

I have been browsing through the dataset and I have been baffled by the following QA:

Question

Select the organism in the same species as the gray heron.

Context

This organism is a gray heron. Its scientific name is Ardea cinerea.

Choices

  • Ardea cinerea
  • Hyla cinerea
  • Lonicera japonica

Answer

Ardea cinerea


What reasoning pattern can be learned from an example that contains the verbatim answer in the context description?!

When reading the lesson and the explanation for the QA, I think the point was to explains the rules of scientific classification on a level of genus and species.

It think the question, choices and answer should be changed as follows to make sense:

Question

Select the organism in the same genus as the gray heron.

Context

This organism is a gray heron. Its scientific name is Ardea cinerea.

Choices

  • Ardea herodias
  • Hyla cinerea
  • Lonicera japonica

Answer

Ardea herodias

palimondo avatar Jan 21 '24 07:01 palimondo

I think the question originates from the exercises on this IXL page: https://www.ixl.com/science/grade-8/use-scientific-names-to-classify-organisms In the original context both the question and answer contain two different images of the same species, but ScienceQA does not include the images associated with the answers in this question. Having both images might be useful for training multimodal LLM, by providing 2 images with the same label, which isn't the focus of your dataset. Given that even the original page is using the full latin name in both question and answer, it short-circuits the need for reasoning and teaches no valuable pattern.

Many of the opening QAs on linked page have this issue, but later QAs focus the question on genus, as I've suggested above: image Therefore I think it the ScienceQA should filter out the questions of the first type (same species) and include only question about the genus from the linked page.

palimondo avatar Feb 14 '24 00:02 palimondo

Hmmm... Thinking about his some more, I think the confusing part for me is the order of presentation of the Question, Context and Answer in your Explore interface. The original page starts with the Context. Then it presents the question followed by answers. In this particular case the lesson requires a reasoning step of logically connecting the two names presented in the context as equivalent. The question asks about the common name and the answer is presented in scientific name. This makes sense: image This is indeed a valuable reasoning pattern. But it must be used in training in the correct sequence.

The presentation order in the ScienceQA interface, where the Question is asked first and the context basically gives away the answer, is confusing: image

You are using the same confusing order in Appendix B.3, B.4 and B.5 of your paper. What order did you use in training/evaluating model's reasoning?

palimondo avatar Feb 14 '24 01:02 palimondo