DGL Multi-gpu example CUDA Runtime Error
🐛 Bug
Run examples/multigpu/node_classification_sage.py --mode benchmark --gpu=0,1,2,3.
The error I got:
Training in benchmark mode using 4 GPU(s)
Loading data
Training...
Epoch 00000 | Loss 2.2777 | Accuracy 0.7878 | Time 8.7002
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [3,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [5,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [6,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [8,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [9,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [10,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [12,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [15,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [16,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [17,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [19,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [20,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [21,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [22,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [23,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [24,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [25,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [26,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [27,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [28,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [29,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [30,0,0] Assertion `t >= 0 && t < n_classes` failed.
../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [31,0,0] Assertion `t >= 0 && t < n_classes` failed.
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fa9d9f9e4d7 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fa9d9f6836b in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fa9da03ab58 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x1c36b (0x7fa9da00b36b in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #4: <unknown function> + 0x2b930 (0x7fa9da01a930 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #5: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x2b (0x7fa99c9a9419 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/tensoradapter/pytorch/libtensoradapter_pytorch_2.0.0.so)
frame #6: CUDARawDelete + 0x1c (0x7fa99c9a84a6 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/tensoradapter/pytorch/libtensoradapter_pytorch_2.0.0.so)
frame #7: dgl::runtime::NDArray::Internal::DefaultDeleter(dgl::runtime::NDArray::Container*) + 0x25c (0x7fa9aea72afc in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/libdgl.so)
frame #8: dgl::UnitGraph::COO::~COO() + 0xff (0x7fa9aebdb4ef in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/libdgl.so)
frame #9: dgl::UnitGraph::~UnitGraph() + 0x130 (0x7fa9aebdafc0 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/libdgl.so)
frame #10: dgl::HeteroGraph::~HeteroGraph() + 0xb5 (0x7fa9aea90fa5 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/libdgl.so)
frame #11: DGLObjectFree + 0xc5 (0x7fa9aea50ff5 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/libdgl.so)
frame #12: <unknown function> + 0x1171e (0x7fa99c98a71e in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/_ffi/_cy3/core.cpython-38-x86_64-linux-gnu.so)
frame #13: <unknown function> + 0x13b421 (0x560a4c17d421 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #14: <unknown function> + 0x126ccb (0x560a4c168ccb in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #15: <unknown function> + 0x114b96 (0x560a4c156b96 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #16: <unknown function> + 0x13b1cc (0x560a4c17d1cc in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #17: <unknown function> + 0x745ea5 (0x7faa40c2bea5 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #18: torch::autograd::deleteNode(torch::autograd::Node*) + 0x54 (0x7faa2ba53d64 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #19: std::_Sp_counted_deleter<torch::autograd::PyNode*, void (*)(torch::autograd::Node*), std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0xe (0x7faa40c26a0e in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #20: torch::autograd::deleteNode(torch::autograd::Node*) + 0xa9 (0x7faa2ba53db9 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #21: std::_Sp_counted_deleter<torch::autograd::generated::AddBackward0*, void (*)(torch::autograd::Node*), std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0xe (0x7faa2b1a9d7e in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #22: <unknown function> + 0x4ac2df0 (0x7faa2ba34df0 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #23: c10::TensorImpl::~TensorImpl() + 0x1b5 (0x7fa9d9f7c695 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #24: c10::TensorImpl::~TensorImpl() + 0x9 (0x7fa9d9f7c7b9 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #25: <unknown function> + 0x75acd8 (0x7faa40c40cd8 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #26: THPVariable_subclass_dealloc(_object*) + 0x325 (0x7faa40c41085 in /opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #27: <unknown function> + 0x121dc8 (0x560a4c163dc8 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #28: <unknown function> + 0x133068 (0x560a4c175068 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #29: <unknown function> + 0x133051 (0x560a4c175051 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #30: <unknown function> + 0x133051 (0x560a4c175051 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #31: <unknown function> + 0x132cc3 (0x560a4c174cc3 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #32: <unknown function> + 0x10f928 (0x560a4c151928 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #33: <unknown function> + 0x148e23 (0x560a4c18ae23 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #34: _PyEval_EvalFrameDefault + 0x2584 (0x560a4c15d874 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #35: _PyEval_EvalCodeWithName + 0x2f1 (0x560a4c15a261 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #36: _PyFunction_Vectorcall + 0x19c (0x560a4c16b89c in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #37: _PyEval_EvalFrameDefault + 0x6d5 (0x560a4c15b9c5 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #38: _PyFunction_Vectorcall + 0x106 (0x560a4c16b806 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0x3aa (0x560a4c15b69a in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #40: _PyEval_EvalCodeWithName + 0x2f1 (0x560a4c15a261 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #41: _PyFunction_Vectorcall + 0x19c (0x560a4c16b89c in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #42: _PyEval_EvalFrameDefault + 0x11bb (0x560a4c15c4ab in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #43: _PyEval_EvalCodeWithName + 0x2f1 (0x560a4c15a261 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #44: PyEval_EvalCodeEx + 0x39 (0x560a4c20cfd9 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #45: PyEval_EvalCode + 0x1b (0x560a4c20cf9b in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #46: <unknown function> + 0x1eb929 (0x560a4c22d929 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #47: <unknown function> + 0x1ea923 (0x560a4c22c923 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #48: PyRun_StringFlags + 0x7d (0x560a4c22a30d in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #49: PyRun_SimpleStringFlags + 0x3d (0x560a4c0cff56 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #50: Py_RunMain + 0x27e (0x560a4c2297fe in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #51: Py_BytesMain + 0x39 (0x560a4c2007b9 in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
frame #52: __libc_start_main + 0xf3 (0x7faa71c50083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #53: <unknown function> + 0x1be6bd (0x560a4c2006bd in /opt/conda/envs/dgl-dev-gpu-118/bin/python)
LIBXSMM_VERSION: main-1.17-3659 (25693771)
LIBXSMM_TARGET: clx [Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz]
Registry and code: 13 MB
Command: /opt/conda/envs/dgl-dev-gpu-118/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=37, pipe_handle=67) --multiprocessing-fork
Uptime: 16.470712 s
Traceback (most recent call last):
File "error.py", line 380, in <module>
mp.spawn(
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 3 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/ubuntu/Work/benchmark/error.py", line 288, in run
train(
File "/home/ubuntu/Work/benchmark/error.py", line 214, in train
y_hat = model(blocks, x)
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/Work/benchmark/error.py", line 76, in forward
h = layer(block, h)
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/nn/pytorch/conv/sageconv.py", line 237, in forward
graph.update_all(msg_fn, fn.mean("m", "neigh"))
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/heterograph.py", line 5110, in update_all
ndata = core.message_passing(
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/core.py", line 398, in message_passing
ndata = invoke_gspmm(g, mfunc, rfunc)
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/core.py", line 368, in invoke_gspmm
z = op(graph, x)
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/ops/spmm.py", line 215, in func
return gspmm(g, "copy_lhs", reduce_op, x, None)
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/ops/spmm.py", line 112, in gspmm
deg = F.astype(F.clamp(deg, 1, max(g.num_edges(), 1)), F.dtype(ret))
File "/opt/conda/envs/dgl-dev-gpu-118/lib/python3.8/site-packages/dgl-1.2-py3.8-linux-x86_64.egg/dgl/backend/pytorch/tensor.py", line 126, in astype
return input.type(ty)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
- The traceback of the python part is not accurate because CUDA kernel errors might be asynchronously reported at some other API call. But the error seems to occur when calculating loss inside
cross_entropy(). - I tried to check the
yandy_hatand see if they are valid. But once I output them or assert them or check them, the error disappears. - If I set
CUDA_LAUNCH_BLOCKING=1, there will be no error and the entire example will work fine. - If I comment out line 166&167 (
prefetch_node_featsandprefetch_labelsinNeighborSampler), there will be no error and the entire example will work fine. - Upgrading of torch version doesn't help.
- Same problem occurs if use the history master branch in Sept. 2023.
To Reproduce
Run examples/multigpu/node_classification_sage.py --mode benchmark --gpu=0,1,2,3.
Expected behavior
The example should run without any error.
Environment
My environment:
-
python: 3.8.17
-
torch: 2.0.0+cu118
-
dgl: current master branch
-
DGL Version (e.g., 1.0): current master branch (nightly build)
-
Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3):
-
OS (e.g., Linux): Linux
-
How you installed DGL (
conda,pip, source): conda -
Build command you used (if compiling from source):
bash ./script/build_dgl.sh -g -e '-DBUILD_GRAPHBOLT=ON' -
Python version: 3.8.17
-
CUDA/cuDNN version (if applicable): 11.8
-
GPU models and configuration (e.g. V100): EC2 g4dn-metal
-
Any other relevant information: torch version 2.0.0+cu118
Additional context
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you
It runs OK in the DGL NGC container 24.01. Will try to repo without container.
root@ecfeb6f27748:/opt/dgl/dgl-source/examples/multigpu# python node_classification_sage.py --gpu 0,1,2,3
Training in mixed mode using 4 GPU(s)
Loading data
This will download 1.38GB. Will you proceed? (y/N)
y
Downloading http://snap.stanford.edu/ogb/data/nodeproppred/products.zip
Downloaded 1.38 GB: 100%|| 1414/1414 [00:20<00:00, 67.44it/s]
Extracting dataset/products.zip
Loading necessary files...
This might take a while.
Processing graphs...
100%|1/1 [00:01<00:00, 1.66s/it]
Converting graphs into DGL objects...
100%|1/1 [00:00<00:00, 4.29it/s]
Saving...
Training...
Epoch 00000 | Loss 2.4287 | Accuracy 0.7714 | Time 5.3175
Epoch 00001 | Loss 0.9772 | Accuracy 0.8369 | Time 4.5307
Epoch 00002 | Loss 0.7459 | Accuracy 0.8538 | Time 4.5606
Epoch 00003 | Loss 0.6612 | Accuracy 0.8641 | Time 4.5573
Epoch 00004 | Loss 0.5897 | Accuracy 0.8713 | Time 4.5261
Epoch 00005 | Loss 0.5837 | Accuracy 0.8748 | Time 4.5167
Epoch 00006 | Loss 0.5278 | Accuracy 0.8786 | Time 4.5288
Epoch 00007 | Loss 0.4970 | Accuracy 0.8826 | Time 4.5237
Epoch 00008 | Loss 0.5000 | Accuracy 0.8828 | Time 4.5237
Epoch 00009 | Loss 0.4745 | Accuracy 0.8863 | Time 4.5328
Testing...
Test accuracy 0.7296
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you