Dataset Comparison tool
Proposed by @ogrisel - A 'comparison' view to see how two datasets differ, including for instance:
- list column with different names between 2 versions of the same dataset or 2 datasets chosen by the user,
- list change of data type representation for columns with same names,
- list per-column number of rows with changed values and show the first 5 differing row values,
Possible approach: the new dataset table view allows users to select rows and do action on the selected datasets. 'Compare' could be one such action.
Thanks for opening this feature request. A related feature request would be to ask the dataset uploaders to better trace the lineage of their uploads.
For instance by linking to a public git repo with a script that can reproduce the version of the data uploaded to openml.org from the original raw data (if publicly available on another website).
Similarly, when uploading a new version, it would be helpful to document the relevant changes in such a script.