mljar-supervised Preprocessing Feature Improvements

The MLjar is a great framework, and I suggest customizing the preprocessing steps.

Currently, the preprocessing is encapsulated by the ModelFramework class and is unreachable for the programmer to modify. I'd like to add dimensionality reduction, oversampling, and undersampling methods to the preprocessing steps. I suggest making the preprocessing class accessible to outsiders and passing it through to the ModelFramework. This way, the preprocessing class could be a "manager" for all the preprocessing steps. The developers could then add custom/additional preprocessing steps to the pipeline. It would also be great to integrate the parameters to be optimized on those preprocessing methods (e.g., number of components for the dimensionality reduction method).

If there might already be a solution, I'm eager to hear about it.

May 07 '25 20:05 FHellmann

Hi @FHellmann

Thanks for the suggestion! Exposing the preprocessing pipeline for custom steps (PCA, SMOTE, undersampling) and tuning their hyperparameters alongside the model would be powerful.

Right now our team is small and we can’t refactor this immediately, but we’d love to collaborate. If you’re up for sketching an API or sharing a prototype PR, we can review and plan it into our roadmap. Thanks for helping improve MLJAR! ❤ 🚀

May 08 '25 07:05 pplonski

Alright, let me see what I can do to share a prototype PR with you.

May 08 '25 10:05 FHellmann