croissant icon indicating copy to clipboard operation
croissant copied to clipboard

OpenML: Disagreement between OpenML keywords and Croissant keywords

Open TheRazorace opened this issue 1 year ago • 2 comments

For many datasets, the keywords field of the Croissant metadata json is wrong. For example, https://www.openml.org/search?type=data&status=active&id=925&sort=runs. This is evident by comparing them to OpenML keywords and the datasets content. For some reason, the keywords "Life Science" and "Chemistry" appear in a lot of Croissant metadata files even though the datasets are not related.

TheRazorace avatar Feb 19 '25 15:02 TheRazorace

Just to confirm, this is an issue with the OpenML management of keywords, right?

benjelloun avatar Jun 02 '25 16:06 benjelloun

They seem to be correct in the OpenML website, but nor correct when the json is extracted. I am not sure where the issue appears.

TheRazorace avatar Jun 03 '25 06:06 TheRazorace