Create openml_hard_id_list.txt to include 36 hardest datasets in Table 4 #104
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
To address issue #103 , the full list of hardest IDs are included in a text file.
This is achieved by a fuzzy matching, the non-precisely matched files are
For colic, there are two datasets (
openml__colic__25
andopenml__colic__27
). After checking metadata,openml__colic__25
has 26 features, whileopenml__colic__27
only has 22 features. The number of features in Table 4 is 27, which aligns more withopenml__colic__25
(maybe inlcuding label column), thusopenml__colic__25
is kept in the list.For GesturePhase, the closest match is
openml__GesturePhaseSegmentationProcessed__14969
.For 100-plants-texture, the closest match is
openml__one-hundred-plants-texture__9956
.