Confused QA about grey heron (Ardea cinerea)
Hi,
I have been browsing through the dataset and I have been baffled by the following QA:
Question
Select the organism in the same species as the gray heron.
Context
This organism is a gray heron. Its scientific name is Ardea cinerea.
Choices
- Ardea cinerea
- Hyla cinerea
- Lonicera japonica
Answer
Ardea cinerea
What reasoning pattern can be learned from an example that contains the verbatim answer in the context description?!
When reading the lesson and the explanation for the QA, I think the point was to explains the rules of scientific classification on a level of genus and species.
It think the question, choices and answer should be changed as follows to make sense:
Question
Select the organism in the same genus as the gray heron.
Context
This organism is a gray heron. Its scientific name is Ardea cinerea.
Choices
- Ardea herodias
- Hyla cinerea
- Lonicera japonica
Answer
Ardea herodias
I think the question originates from the exercises on this IXL page: https://www.ixl.com/science/grade-8/use-scientific-names-to-classify-organisms In the original context both the question and answer contain two different images of the same species, but ScienceQA does not include the images associated with the answers in this question. Having both images might be useful for training multimodal LLM, by providing 2 images with the same label, which isn't the focus of your dataset. Given that even the original page is using the full latin name in both question and answer, it short-circuits the need for reasoning and teaches no valuable pattern.
Many of the opening QAs on linked page have this issue, but later QAs focus the question on genus, as I've suggested above:
Therefore I think it the ScienceQA should filter out the questions of the first type (same species) and include only question about the genus from the linked page.
Hmmm... Thinking about his some more, I think the confusing part for me is the order of presentation of the Question, Context and Answer in your Explore interface. The original page starts with the Context. Then it presents the question followed by answers. In this particular case the lesson requires a reasoning step of logically connecting the two names presented in the context as equivalent. The question asks about the common name and the answer is presented in scientific name. This makes sense:
This is indeed a valuable reasoning pattern. But it must be used in training in the correct sequence.
The presentation order in the ScienceQA interface, where the Question is asked first and the context basically gives away the answer, is confusing:
You are using the same confusing order in Appendix B.3, B.4 and B.5 of your paper. What order did you use in training/evaluating model's reasoning?