graphnet icon indicating copy to clipboard operation
graphnet copied to clipboard

Should we retire training/utils.py?

Open asogaard opened this issue 2 years ago • 4 comments

The module training/utils.py contains a number of utility functions that are not written in the same object-oriented way that the rest of the repo is, the functionality of which have largely been superseded by other parts of the repo, and which generally score poorly in terms of maintainability and test coverage. Specifically:

  • make_dataloader: Could be removed in favour of DataLoader{,.from_dataset_config}
  • make_train_test_dataloader: Could be removed in favour of DataLoader{,.from_dataset_config}
  • get_predictions: Could be removed in favour of Model.predict{,_as_dataframe}
  • save_results: There is no other code to directly replace this, I figure it's the one that's most worth keeping. It could be replaced with just a few lines of code that are specific to the training in question. However, I can see the argument for having a standard way of saving results. In this case, though, I would recommend we consider simplifying some of the internal logic (e.g., doing away with the separate db, archive, and tag arguments).

Is the above functionality actively being used in favour of the newer alternatives? If so, why?

asogaard avatar Mar 24 '23 12:03 asogaard

Tagging @RasmusOrsoe, @MortenHolmRep, @Peterandresen12, @Andreas-MJ, @sdebes, @MoustHolmes, and @Aske-Rosted for their perspectives.

asogaard avatar Mar 24 '23 12:03 asogaard

I use these 4 functions all the time. Would be no problem for me to throw away get_predictions. I have not transitioned to the dataloaders from configs because I have not had the chance to familiarize myself with this new functionality.

RasmusOrsoe avatar Mar 28 '23 11:03 RasmusOrsoe

Same as Rasmus, I have been using them a lot. Probably just because I have not really started using the config files.

Peterandresen12 avatar Mar 28 '23 12:03 Peterandresen12

It seems that the consensus is that these utility functions are being used frequently because folks haven't started using the functionality that is intended to supersede these. This, then, begs the questions: Should we keep these functions because they are being used, or should we exactly retire them to encourage the transition to the proposed new functionality? 😉

The underlying question is: How do we best solve the problems these utility functions addressing in the best way? That is,

  • Constructing DataLoaders
  • Constructing separate training and test DataLoaders
  • Get predictions from Models
  • Save Models and prediction results

asogaard avatar Apr 03 '23 15:04 asogaard