tabzilla icon indicating copy to clipboard operation
tabzilla copied to clipboard

Create openml_hard_id_list.txt to include 36 hardest datasets in Table 4

Open JerryLife opened this issue 10 months ago • 0 comments

To address issue #103 , the full list of hardest IDs are included in a text file.

This is achieved by a fuzzy matching, the non-precisely matched files are

  • For colic, there are two datasets (openml__colic__25 and openml__colic__27). After checking metadata, openml__colic__25 has 26 features, while openml__colic__27 only has 22 features. The number of features in Table 4 is 27, which aligns more with openml__colic__25 (maybe inlcuding label column), thus openml__colic__25 is kept in the list.

  • For GesturePhase, the closest match is openml__GesturePhaseSegmentationProcessed__14969.

  • For 100-plants-texture, the closest match is openml__one-hundred-plants-texture__9956.

JerryLife avatar Mar 13 '25 05:03 JerryLife