Category: A1; Team name: DLLB; Dataset: WebKB
Co-authored-by: luka-benic [email protected] Co-authored-by: dleko11 [email protected]
Checklist
- [x] My pull request has a clear and explanatory title.
- [x] My pull request passes the Linting test.
- [x] I added appropriate unit tests and I made sure the code passes all unit tests. (refer to comment below)
- [x] My PR follows PEP8 guidelines. (refer to comment below)
- [x] My code is properly documented, using numpy docs conventions, and I made sure the documentation renders properly.
- [x] I linked to issues and PRs that are relevant to this PR.
Description
This pull request adds support for the WebKB dataset collection from PyTorch Geometric [1]. This contribution is part of the TAG-DS Topological Deep Learning Challenge 2025, under Mission A, Category A.1: Broadening Benchmarks with Graphs & Point Clouds [2].
The WebKB dataset contains web pages from the computer science departments of several universities, originally collected by Carnegie Mellon University. The PyTorch Geometric implementation includes three subsets of the original dataset: Cornell, Texas, and Wisconsin, as used in the Geom-GCN: Geometric Graph Convolutional Networks paper [3]. In these graphs, nodes represent individual web pages, and edges represent hyperlinks between them. The goal is to classify nodes into one of five categories: student, project, course, staff, or faculty [3].
References:
[1] PyTorch Geometric WebKB Dataset Documentation
[2] TAG-DS Topological Deep Learning Challenge 2025
[3] Pei, H., Wei, B., Chang, K. C.-C., Lei, Y., & Yang, B. (2020). Geom-GCN: Geometric Graph Convolutional Networks. ICLR 2020.
Issue
Additional context
Added .yaml files for two additional WebKB datasets, Wisconsin and Cornell (Texas was already included).