TopoBench Category: B2; Team name: DLLB; Dataset: PPI

Co-authored-by: luka-benic [email protected] Co-authored-by: dleko11 [email protected]

Checklist

[x] My pull request has a clear and explanatory title.
[x] My pull request passes the Linting test.
[x] I added appropriate unit tests and I made sure the code passes all unit tests. (refer to comment below)
[x] My PR follows PEP8 guidelines. (refer to comment below)
[x] My code is properly documented, using numpy docs conventions, and I made sure the documentation renders properly.
[x] I linked to issues and PRs that are relevant to this PR.

Description

This PR extends TopoBench to support edge-level link prediction on both transductive and inductive graph datasets, and adds a tutorial notebook that illustrates how to use the new functionality.

Concretely, the PR introduces:

Edge-level split utilities for link prediction (transductive and inductive).
A dynamic negative sampling transform integrated into the dataloading pipeline.
A dedicated edge-level readout (LinkPredictionReadOut) for link prediction on top of existing GNN backbones (GCN, GAT).
Example dataset configurations for Cora (transductive), MUTAG (inductive), and PPI (inductive, predefined splits).
A tutorial notebook showing the full workflow end-to-end.

Key Changes (Code)

Edge-level splitting
- load_edge_transductive_splits for single-graph / transductive datasets (e.g. Cora).
- load_edge_inductive_splits for multi-graph / inductive datasets (e.g. MUTAG, PPI).
- Both return DataloadDataset objects with:
  - edge_label_index, edge_label (positive and negative candidate edges),
  - consistent handling of val/test negatives vs train-time negatives.
Dynamic negative sampling
- NegativeSamplingTransform in topobench.transforms.data_manipulations:
  - takes positive edges from edge_label_index,
  - samples fresh negatives via torch_geometric.utils.negative_sampling,
  - rebuilds edge_label_index / edge_label each epoch according to neg_pos_ratio and neg_sampling_method.
Edge-level readout
- LinkPredictionReadOut in topobench.nn.readouts:
  - consumes node embeddings x_0 from the backbone,
  - scores candidate edges via dot products,
  - outputs 2-class logits (no-edge, edge) and attaches labels for the loss/evaluator.
PPI dataset support
- New loader (based on torch_geometric.datasets.PPI) that:
  - loads the predefined train/val/test splits from PyG,
  - combines them into a single dataset with a split_idx mapping,
  - is compatible with the inductive edge-level splitting utilities.
Configuration-level support
- task_level: edge and num_classes: 2 for link prediction.
- Extended split_params for link prediction:
  - learning_setting (transductive / inductive),
  - val_prop, test_prop, train_prop,
  - is_undirected,
  - neg_pos_ratio (dynamic train negatives),
  - neg_sampling_ratio (static val/test negatives),
  - neg_sampling_method.

These changes plug into the existing TopoBench training pipeline without altering the high-level interface (Hydra configs + run.yaml).

Tutorial (Usage Example)

To illustrate the new link prediction support, this PR also adds:

tutorials/tutorial_link_prediction.ipynb

The notebook demonstrates:

Transductive and inductive link prediction setups demonstrated on the Cora and MUTAG datasets.
How the split utilities, negative sampling transform, and LinkPredictionReadOut interact in practice.
Running short GCN/GAT experiments and inspecting basic metrics and visualizations of positive/negative edges in the splits.

The tutorial is an example user guide for the new functionality; all core logic lives in the library code.

Issue

Additional context

Nov 22 '25 20:11 dleko11

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Nov 22 '25 20:11 review-notebook-app[bot]

We fixed some compatibility issues, namely we had a problem with the PPI dataset class from torch_geometric version 2.8.0. which was not compatible with networkx version 2.8.8.

Nov 23 '25 10:11 luka-benic