Chang Liu

Results 32 comments of Chang Liu

Add-ons: - [ ] Some P0 cases multi-gpu refactor https://github.com/dmlc/dgl/issues/4409 - [ ] Some P0 cases link prediction refactor https://github.com/dmlc/dgl/issues/4410 - [ ] Refactor RGAT example https://github.com/dmlc/dgl/issues/4411

>Therefore, I think putting the example either in examples/pytorch/gin or examples/pytorch/ogb will make it hard for users to find them. I think we probably should create a folder examples/pytorch/multigpu/ and...

Given the size of this problem (GINDataset/IMDBBINARY) and very few number of epochs (=5) in use, you wouldn't be able to observe the benefits of multi-processing and multi-gpu runs. The...

> @BarclayII @chang-l Do we have examples showing perf benefit of using multi-gpus? I believe current multi-gpu examples like graphsage, and rgcn can show decent speed-up by using more GPUs....

@yaox12 Thank you for sharing your code. I tested it on my side (A5000, cuda11.7), but I observed different results: | feat dim | fp32 (ms) | fp16 (ms) |...

> Is "examples/pytorch/rgcn/entity_sample_multi_gpu.py" ready for review? No, not yet. Thank you for your review. I will address tomorrow.

`entity_sample_multi_gpu.py` ready for review. Similar changes as `entity_sample.py`. Note I simplified host collective communication module (implemented with `mp.queue.put()/get()` + `gc.collect()`), to device collective communication using NCCL (`dist.reduce`). Updated PR description...

>I calculated mean and std based on your tables. You can put them in the PR description instead. Why did you not include the results for AM with entity_sample.py and...

@mufeili I synced up to master. I think it is ready to merge when CI is passed.

@jermainewang In fact, this issue has not been resolved because of the outdated DGL dataloader APIs. It seems `train_sampling.py` uses at least two features that have been deprecated: (1) Set...