graphnet icon indicating copy to clipboard operation
graphnet copied to clipboard

Collect open dataset(s) for model development and benchmarking

Open RasmusOrsoe opened this issue 4 months ago • 1 comments

Description
Open datasets would enable benchmarking and foster reproducibility. Options include existing open-source datasets e.g. PROMETHEUS and Kaggle datasets. One might include datasets that are similar in form to other physics experiments, such as jet tagging, etc.

Acceptance Criteria

  • [ ] Identify candidate datasets
  • [ ] Convert dataset to supported file format(s)
  • [ ] Provide open dataset with clear documentation or reference to existing documentation

RasmusOrsoe avatar Sep 29 '25 08:09 RasmusOrsoe

The NPML seem to be creating a data set challenge that will be ongoing for 6 months at a time before being replaced with a new dataset. These might be interesting for bench-marking and could also provide an opportunity to show off the capabilities of GraphNeT. https://indico.ipmu.jp/event/462/page/1500-data-challenges-and-olympics

Aske-Rosted avatar Sep 30 '25 07:09 Aske-Rosted