Category: A1; Team name: LangDiff; Dataset: Twitch
Checklist
- [x] My pull request has a clear and explanatory title.
- [x] My pull request passes the Linting test.
- [x] I added appropriate unit tests and I made sure the code passes all unit tests. (refer to comment below)
- [x] My PR follows PEP8 guidelines. (refer to comment below)
- [x] My code is properly documented, using numpy docs conventions, and I made sure the documentation renders properly.
- [x] I linked to issues and PRs that are relevant to this PR.
Description
Pull request for Twitch Dataset [1] implementation.
The Twitch Dataset consists of multiple social network graphs for streamers speaking different languages on the Streaming Platform Twitch. Each node is a Streamer and Edges correspond to followership between them. Feature embeddings represent the games played. The classification task is whether or not a user is streaming mature content based on the games played.
[1] Benedek Rozemberczki, Carl Allen, & Rik Sarkar. (2021). Multi-scale Attributed Node Embedding.
Relevant PRs from PyTorch Geometric
The Dataset is present in PyTorch Geometric, but currently broken pyg-team/pytorch_geometric#10510 hence implemented fully here.
There also is a relevant PR pyg-team/pytorch_geometric#10415 which I think does not fully fix the issue.
Additional context
Submission by Jonas Müller of Team LangDiff