IGB-Datasets icon indicating copy to clipboard operation
IGB-Datasets copied to clipboard

"You can't change --num_classes on the small and medium datasets."

Open yichuan-w opened this issue 2 years ago • 0 comments

I think this might be a bug. Could someone take a look if it's convenient? When I try to modify https://github.com/IllinoisGraphBenchmark/IGB-Datasets/blob/main/igb/train_single_gpu.py#L150 on the small and medium datasets, there's always an error. However, it works on the tiny dataset


../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [28,0,0] Assertion `t >= 0 && t < n_classes` failed.
  0%|                                                                                                                                                                                          | 0/20 [00:06<?, ?it/s]
Traceback (most recent call last):
  File "/home/yw8143/GNN/GNN_acceleration/example/dglexample/IGB-Datasets/igb/default.py", line 187, in <module>
    track_acc(g, args, device)
  File "/home/yw8143/GNN/GNN_acceleration/example/dglexample/IGB-Datasets/igb/default.py", line 85, in track_acc
    loss.backward()
  File "/scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply
    return user_fn(self, *args)
  File "/scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/dgl/backend/pytorch/sparse.py", line 211, in backward
    dX = gspmm(g_rev, "mul", "sum", dZ, Y)
  File "/scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/dgl/backend/pytorch/sparse.py", line 1032, in gspmm
    return GSpMM.apply(*args)
  File "/scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/dgl/backend/pytorch/sparse.py", line 165, in forward
    out, (argX, argY) = _gspmm(gidx, op, reduce_op, X, Y)
  File "/scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/dgl/_sparse_ops.py", line 239, in _gspmm
    _CAPI_DGLKernelSpMM(
  File "dgl/_ffi/_cython/./function.pxi", line 295, in dgl._ffi._cy3.core.FunctionBase.__call__
  File "dgl/_ffi/_cython/./function.pxi", line 241, in dgl._ffi._cy3.core.FuncCall
dgl._ffi.base.DGLError: [17:57:42] /opt/dgl/src/array/cuda/./spmm.cuh:724: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA: device-side assert triggered
Stack trace:
  [bt] (0) /scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/dgl/libdgl.so(+0x11ea17f) [0x7f66122f517f]
  [bt] (1) /scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/dgl/libdgl.so(void dgl::aten::cuda::SpMMCsr<int, float, dgl::aten::cuda::binary::Mul<float>, dgl::aten::cuda::reduce::Sum<int, float, false> >(dgl::BcastOff const&, dgl::aten::CSRMatrix const&, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray)+0x790) [0x7f661231db80]
  [bt] (2) /scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/dgl/libdgl.so(void dgl::aten::SpMMCsr<2, int, float>(std::string const&, std::string const&, dgl::BcastOff const&, dgl::aten::CSRMatrix const&, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, std::vector<dgl::runtime::NDArray, std::allocator<dgl::runtime::NDArray> >)+0x45d) [0x7f661237abcd]
  [bt] (3) /scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/dgl/libdgl.so(dgl::aten::SpMM(std::string const&, std::string const&, std::shared_ptr<dgl::BaseHeteroGraph>, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, std::vector<dgl::runtime::NDArray, std::allocator<dgl::runtime::NDArray> >)+0x1253) [0x7f66117b4f43]
  [bt] (4) /scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/dgl/libdgl.so(+0x6cbad9) [0x7f66117d6ad9]
  [bt] (5) /scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/dgl/libdgl.so(+0x6cc1a1) [0x7f66117d71a1]
  [bt] (6) /scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7f66118193f8]
  [bt] (7) /scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/dgl/_ffi/_cy3/core.cpython-310-x86_64-linux-gnu.so(+0x15603) [0x7f6610cf7603]
  [bt] (8) /scratch/yw8143/miniconda3/envs/mariusGNNenv/lib/python3.10/site-packages/dgl/_ffi/_cy3/core.cpython-310-x86_64-linux-gnu.so(+0x15c2b) [0x7f6610cf7c2b]

yichuan-w avatar Oct 02 '23 22:10 yichuan-w