openml.org icon indicating copy to clipboard operation
openml.org copied to clipboard

Dataset Comparison tool

Open joaquinvanschoren opened this issue 2 years ago • 1 comments

Proposed by @ogrisel - A 'comparison' view to see how two datasets differ, including for instance:

  • list column with different names between 2 versions of the same dataset or 2 datasets chosen by the user,
  • list change of data type representation for columns with same names,
  • list per-column number of rows with changed values and show the first 5 differing row values,

Possible approach: the new dataset table view allows users to select rows and do action on the selected datasets. 'Compare' could be one such action.

joaquinvanschoren avatar Dec 03 '23 21:12 joaquinvanschoren

Thanks for opening this feature request. A related feature request would be to ask the dataset uploaders to better trace the lineage of their uploads.

For instance by linking to a public git repo with a script that can reproduce the version of the data uploaded to openml.org from the original raw data (if publicly available on another website).

Similarly, when uploading a new version, it would be helpful to document the relevant changes in such a script.

ogrisel avatar Dec 08 '23 10:12 ogrisel