Graphbolt: Enable separate stream for CUDA memory copy and computation

Open peizhou001 opened this issue 2 years ago • 2 comments

🔨Work Item

IMPORTANT:

This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.

Project tracker: https://github.com/orgs/dmlc/projects/2

Nov 06 '23 02:11 peizhou001

We will probably handle this automatically when we finalize the design of the pipelining and executor logic for the sampling stage.

Jan 07 '24 06:01 mfbalin

This is already implemented in the dataloader with the overlap_feature_fetch switch.

Feb 01 '24 18:02 mfbalin

@frozenbugs is there more to be done for this issue? We already support feature copy overlap when features are pinned.

Apr 06 '24 04:04 mfbalin