tabzilla
tabzilla copied to clipboard
Create openml_hard_id_list.txt to include 36 hardest datasets in Table 4
To address issue #103 , the full list of hardest IDs are included in a text file.
This is achieved by a fuzzy matching, the non-precisely matched files are
-
For colic, there are two datasets (
openml__colic__25andopenml__colic__27). After checking metadata,openml__colic__25has 26 features, whileopenml__colic__27only has 22 features. The number of features in Table 4 is 27, which aligns more withopenml__colic__25(maybe inlcuding label column), thusopenml__colic__25is kept in the list. -
For GesturePhase, the closest match is
openml__GesturePhaseSegmentationProcessed__14969. -
For 100-plants-texture, the closest match is
openml__one-hundred-plants-texture__9956.