TopoBench Category: B1; Team name: DLLB; Dataset: FakeDataset

Checklist

[x] My pull request has a clear and explanatory title.
[x] My pull request passes the Linting test.
[x] I added appropriate unit tests and I made sure the code passes all unit tests. (refer to comment below)
[x] My PR follows PEP8 guidelines. (refer to comment below)
[x] My code is properly documented, using numpy docs conventions, and I made sure the documentation renders properly.
[x] I linked to issues and PRs that are relevant to this PR.

Description

This PR introduces an implementation of an on-disk data loading pipeline for inductive datasets, along with memory profiling utilities.

Key Features

On-Disk Dataset Support:
Implemented an on-disk version of PyG’s FakeDataset to enable realistic testing without holding the full dataset in memory.
On-Disk Preprocessor:
Added a preprocessor built on top of PyG’s OnDiskDataset, which applies transformations one graph at a time and saves the processed outputs.
This ensures the entire dataset is never fully loaded into memory.
Transform Categorisation:
Introduced a two-tier transform strategy:
- Heavy transforms: topology and feature liftings, executed during the on-disk preprocessing phase.
- Easy transforms: data manipulation and intrinsic dataset transforms, applied on the fly at load time.
Data Splitting Enhancements:
Updated load_inductive_splits and assign_train_val_test_mask_to_graphs to support lazy lists, minimizing memory use by avoiding in-memory storage of dataset splits.

Testing & Validation

The pipeline passes the existing pipeline test suite.
Added a new memory usage test comparing:
- Our new on-disk FakeDataset
- PyG’s original in-memory FakeDataset
Memory usage was successfully tested for the following models:
- graph/gcn
- cell/topotune
- simplicial/topotune

Details are available in the tutorial_on_disk_inductive_pipeline.ipynb notebook.

Nov 03 '25 22:11 dleko11

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Nov 03 '25 22:11 review-notebook-app[bot]

Added unit tests, corrected docstrings format.

Nov 19 '25 20:11 dleko11

Hi @dleko11 and @luka-benic, I noticed the current description is missing a reference to the notebook containing the provided utilization examples (.ipynb file). Could you please update the description to include this reference?

Nov 28 '25 10:11 levtelyatnikov

Hi @levtelyatnikov, thanks for letting us know, as per your request, I updated the comment and included the reference to the notebook. Best, David

Nov 28 '25 12:11 dleko11