GraphScope icon indicating copy to clipboard operation
GraphScope copied to clipboard

PyG Remote Backend Based on GraphScope

Open LiSu opened this issue 1 year ago • 1 comments

GraphScope leverages the distributed GNN training framework, graphlearn-for-pytorch (GLTorch), to facilitate large-scale distributed GNN training. GLTorch is model-layer compatible with PyG and enables the extension of PyG-based GNN training to large distributed graphs.

To address the challenge of training GNNs on graphs that exceed the available memory of a single machine, PyG has introduced a pluggable Remote Backend mechanism. This mechanism, through abstractions like FeatureStore and GraphStore, supports integration with third-party graph storage engines. The FeatureStore permits utilization of node/edge features stored remotely, while the GraphStore facilitates access to graph structure information held externally. This project aims to implement a PyG Remote Backend based on GraphScope for PyG to provide a user-friendly experience for conducting distributed GNN training with GraphScope for PyG users.

Deliverables:

  • Implement the PyG FeatureStore and GraphStore abstractions within GraphScope
  • Complete the end-to-end integration of GraphScope and PyG via the Remote Backend

LiSu avatar Apr 23 '24 02:04 LiSu

GraphScope基于分布式GNN训练框架graphlearn-for-pytorch (GLTorch)支持大规模分布式GNN训练。GLTorch在模型层和PyG兼容,支持将PyG GNN训练扩展到分布式大图。为了支持在大于机器可用内存大小的图上训练GNN,PyG引入了一套可插拔的Remote Backend机制,即通过FeatureStore 和 GraphStore等抽象,支持第三方图存储引擎和PyG的对接。其中FeatureStore允许用户利用存储在远程的节点/边特征,GraphStore允许用户利用存储在远程的图结构信息,两者结合支持基于远端存储的GNN训练扩展。本项目旨在通过实现基于Graphscope的PyG Remote Backend,更进一步简化GraphScope和PyG的对接方式,提供对PyG用户友好的基于GraphScope进行分布式GNN训练的产品使用体验。

产出:

  • 基于GLTorch当前架构,设计FeatureStore 和 GraphStore在GraphScope上的实现方案
  • 完成Remote Backend整体实现,在GraphScope上提供基于PyG Remote Backend的分布式训练示例

难度: 初级 技术要求:熟练使用Python语言,熟悉C++

LiSu avatar Apr 23 '24 02:04 LiSu