Should we retire training/utils.py?
The module training/utils.py contains a number of utility functions that are not written in the same object-oriented way that the rest of the repo is, the functionality of which have largely been superseded by other parts of the repo, and which generally score poorly in terms of maintainability and test coverage. Specifically:
-
make_dataloader: Could be removed in favour ofDataLoader{,.from_dataset_config} -
make_train_test_dataloader: Could be removed in favour ofDataLoader{,.from_dataset_config} -
get_predictions: Could be removed in favour ofModel.predict{,_as_dataframe} -
save_results: There is no other code to directly replace this, I figure it's the one that's most worth keeping. It could be replaced with just a few lines of code that are specific to the training in question. However, I can see the argument for having a standard way of saving results. In this case, though, I would recommend we consider simplifying some of the internal logic (e.g., doing away with the separatedb,archive, andtagarguments).
Is the above functionality actively being used in favour of the newer alternatives? If so, why?
Tagging @RasmusOrsoe, @MortenHolmRep, @Peterandresen12, @Andreas-MJ, @sdebes, @MoustHolmes, and @Aske-Rosted for their perspectives.
I use these 4 functions all the time. Would be no problem for me to throw away get_predictions. I have not transitioned to the dataloaders from configs because I have not had the chance to familiarize myself with this new functionality.
Same as Rasmus, I have been using them a lot. Probably just because I have not really started using the config files.
It seems that the consensus is that these utility functions are being used frequently because folks haven't started using the functionality that is intended to supersede these. This, then, begs the questions: Should we keep these functions because they are being used, or should we exactly retire them to encourage the transition to the proposed new functionality? 😉
The underlying question is: How do we best solve the problems these utility functions addressing in the best way? That is,
- Constructing
DataLoaders - Constructing separate training and test
DataLoaders - Get predictions from
Models - Save
Models and prediction results