pmlb icon indicating copy to clipboard operation
pmlb copied to clipboard

First principles datasets

Open gAldeia opened this issue 1 year ago • 0 comments

Data comes from two symbolic regression repos:

  • Miles Cranmer's PySR: https://github.com/MilesCranmer/PySR
  • Etienne Russeil et al.'s MvSR: https://github.com/erusseil/MvSR-analysis

They are all datasets that have a first-principle equation derived from data and used in their respective papers to show how symbolic regression has the potential of retrieving the original equation when only observational data is available.

While some of them have just a few samples and others are synthetically generated, they are challenging for symbolic regression methods and can be used to evaluate these algorithms.

The idea of pushing them into PMLB is to help other users to quickly set up experiments with the data.

I still need to write proper metadata for them. My understanding is that opening a PR will trigger a GA that will push some new files to my fork, which I should complete before the new datasets go to revision. Please let me know if there is there anything I got wrong and need to update!

gAldeia avatar Sep 03 '24 21:09 gAldeia