Category: B2; Team name: DLLB; Dataset: PPI
Co-authored-by: luka-benic [email protected] Co-authored-by: dleko11 [email protected]
Checklist
- [x] My pull request has a clear and explanatory title.
- [x] My pull request passes the Linting test.
- [x] I added appropriate unit tests and I made sure the code passes all unit tests. (refer to comment below)
- [x] My PR follows PEP8 guidelines. (refer to comment below)
- [x] My code is properly documented, using numpy docs conventions, and I made sure the documentation renders properly.
- [x] I linked to issues and PRs that are relevant to this PR.
Description
This PR extends TopoBench to support edge-level link prediction on both transductive and inductive graph datasets, and adds a tutorial notebook that illustrates how to use the new functionality.
Concretely, the PR introduces:
- Edge-level split utilities for link prediction (transductive and inductive).
- A dynamic negative sampling transform integrated into the dataloading pipeline.
- A dedicated edge-level readout (
LinkPredictionReadOut) for link prediction on top of existing GNN backbones (GCN, GAT). - Example dataset configurations for Cora (transductive), MUTAG (inductive), and PPI (inductive, predefined splits).
- A tutorial notebook showing the full workflow end-to-end.
Key Changes (Code)
-
Edge-level splitting
-
load_edge_transductive_splitsfor single-graph / transductive datasets (e.g. Cora). -
load_edge_inductive_splitsfor multi-graph / inductive datasets (e.g. MUTAG, PPI). - Both return
DataloadDatasetobjects with:-
edge_label_index,edge_label(positive and negative candidate edges), - consistent handling of val/test negatives vs train-time negatives.
-
-
-
Dynamic negative sampling
-
NegativeSamplingTransformintopobench.transforms.data_manipulations:- takes positive edges from
edge_label_index, - samples fresh negatives via
torch_geometric.utils.negative_sampling, - rebuilds
edge_label_index/edge_labeleach epoch according toneg_pos_ratioandneg_sampling_method.
- takes positive edges from
-
-
Edge-level readout
-
LinkPredictionReadOutintopobench.nn.readouts:- consumes node embeddings
x_0from the backbone, - scores candidate edges via dot products,
- outputs 2-class logits (no-edge, edge) and attaches labels for the loss/evaluator.
- consumes node embeddings
-
-
PPI dataset support
- New loader (based on
torch_geometric.datasets.PPI) that:- loads the predefined train/val/test splits from PyG,
- combines them into a single dataset with a
split_idxmapping, - is compatible with the inductive edge-level splitting utilities.
- New loader (based on
-
Configuration-level support
-
task_level: edgeandnum_classes: 2for link prediction. - Extended
split_paramsfor link prediction:-
learning_setting(transductive / inductive), -
val_prop,test_prop,train_prop, -
is_undirected, -
neg_pos_ratio(dynamic train negatives), -
neg_sampling_ratio(static val/test negatives), -
neg_sampling_method.
-
-
These changes plug into the existing TopoBench training pipeline without altering the high-level interface (Hydra configs + run.yaml).
Tutorial (Usage Example)
To illustrate the new link prediction support, this PR also adds:
-
tutorials/tutorial_link_prediction.ipynb
The notebook demonstrates:
- Transductive and inductive link prediction setups demonstrated on the Cora and MUTAG datasets.
- How the split utilities, negative sampling transform, and
LinkPredictionReadOutinteract in practice. - Running short GCN/GAT experiments and inspecting basic metrics and visualizations of positive/negative edges in the splits.
The tutorial is an example user guide for the new functionality; all core logic lives in the library code.
Issue
Additional context
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
We fixed some compatibility issues, namely we had a problem with the PPI dataset class from torch_geometric version 2.8.0. which was not compatible with networkx version 2.8.8.