Category: B2; Team name: Loris; Dataset: Fit–Predict Wrapper
Checklist
- [x] My pull request has a clear and explanatory title.
- [x] My pull request passes the Linting test.
- [x] I added appropriate unit tests and I made sure the code passes all unit tests. (refer to comment below)
- [x] My PR follows PEP8 guidelines. (refer to comment below)
- [x] My code is properly documented, using numpy docs conventions, and I made sure the documentation renders properly.
- [ ] I linked to issues and PRs that are relevant to this PR.
Description
This pull request introduces compatibility with fit–predict models, following the widely-used scikit-learn API. The goal of this contribution is to make TopoBench more flexible and accessible, enabling researchers to benchmark not only GNNs and higher-order models but also classical machine-learning models that follow the fit() / predict() paradigm.
Motivation
This extension is motivated by several research and usability needs:
• Faster evaluation of positional and structural embeddings
Many experiments on graph and topological datasets focus on assessing the utility of positional or structural embeddings.
Fit–predict models (e.g., logistic regression, random forests, SVMs, XGBoost-style models) provide a fast and reliable way to evaluate whether embeddings contain meaningful task-relevant information, without the computational overhead of training a full GNN.
• Alignment with recent benchmark-evaluation research
Recent studies have questioned the real effectiveness of graph benchmarks and emphasized the need to evaluate:
- how much relevant information lies in node features vs. graph structure (e.g., RINGS framework),
- whether GNNs offer genuine improvements beyond simpler models.
Integrating fit–predict methods into TopoBench directly supports these research directions by providing baseline models that can:
- isolate the contribution of embeddings,
- quantify how “challenging” or “informative” a dataset truly is,
- test hypotheses about when graph structure matters.
• Better comparison with non-graph models
Many papers compare GNN-based methods with classical machine-learning models.
Native support for fit–predict models allows TopoBench users to run fair and standardized comparisons within a unified framework.
What This PR Includes
- A clean and minimal wrapper/interface for any estimator supporting
fit()andpredict()(and optionallypredict_proba()). - Integration with existing experiment pipelines so fit–predict models can be used seamlessly alongside GNNs and higher-order networks.
- Unit tests verifying that fit–predict models behave consistently within the TopoBench evaluation loop.