TopoBench icon indicating copy to clipboard operation
TopoBench copied to clipboard

Added batching in transductive setting

Open Coerulatus opened this issue 1 year ago • 2 comments

Hello everyone,

I have added the possibility of batching the data in the transductive setting. When working with large graphs, selecting a subset of the graph while keeping the model's performance unchanged for the desired nodes can drastically reduce the memory requirements during training and inference. In torch_geometric, the NeighborLoader performs neighbor sampling to achieve this. This can be done because, in the normal message-passing framework, the information propagates only as far as the number of message-passing steps performed.
The newly added NeighborCellsLoader works similarly but it also selects the relevant higher-order cells, by sequentially reducing all the incidences. In the loader, you can also specify the rank to consider, meaning that you can perform batching over the nodes, edges, or any higher-order cell.

I have also added a tutorial that shows the basic functionality of NeighborCellsLoader. It also tests that the approach works as expected by comparing the model's outputs working with the full graph or with the batched one. Interestingly the number of hops needed is not necessarily equal to the number of layers in the higher-order networks. Information, at each layer, can in general travel further than the 1-neighborhood when working with these models.

Coerulatus avatar Dec 18 '24 16:12 Coerulatus

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Codecov Report

Attention: Patch coverage is 93.18182% with 15 lines in your changes missing coverage. Please review.

Project coverage is 91.55%. Comparing base (99de7cb) to head (9131d58).

Files with missing lines Patch % Lines
topobenchmark/data/batching/cell_loader.py 85.18% 8 Missing :warning:
topobenchmark/data/batching/utils.py 94.95% 6 Missing :warning:
...pobenchmark/data/batching/neighbor_cells_loader.py 96.15% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #128      +/-   ##
==========================================
+ Coverage   91.00%   91.55%   +0.54%     
==========================================
  Files         129      133       +4     
  Lines        3670     3884     +214     
==========================================
+ Hits         3340     3556     +216     
+ Misses        330      328       -2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Dec 18 '24 17:12 codecov[bot]