CUDALibrarySamples icon indicating copy to clipboard operation
CUDALibrarySamples copied to clipboard

Error on einsum_test.py

Open iyupan opened this issue 3 years ago • 2 comments

Hello, I compile cuTENSOR with cuda 11.3.1 without an error. However, I cannot pass the test. I really don't know what to do. Could you please help me?

$ python cutensor/torch/einsum_test.py
.......FF..EEEE.
======================================================================
ERROR: test_einsum_general_equivalent_results_1_test_1 (__main__.EinsumTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/home/xxx/download/CUDALibrarySamples/cuTENSOR/python/cutensor/torch/einsum_test.py", line 227, in test_einsum_general_equivalent_results
    cutensor_rslt.backward(torch.ones_like(cutensor_rslt))
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/autograd/function.py", line 199, in apply
    return user_fn(self, *args)
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/cutensor/torch/einsum.py", line 70, in backward
    d_input_0 = einsum(modeC + ',' + modeB + '->' + modeA, grad_output,
RuntimeError: Trying to create tensor with negative dimension -1: [-1, 8]
Exception raised from check_size_nonnegative at /pytorch/aten/src/ATen/Utils.h:118 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fd1c5b17d62 in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7fd1c5b1468b in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: at::native::empty_cuda(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x8a3 (0x7fd222aa8323 in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: <unknown function> + 0x25aab3e (0x7fd1c8308b3e in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #4: <unknown function> + 0x25aabba (0x7fd1c8308bba in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cu.so)
frame #5: <unknown function> + 0x1d1334e (0x7fd20b11934e in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #6: at::_ops::empty_memory_format::call(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x1c0 (0x7fd20ae217b0 in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #7: at::empty(c10::ArrayRef<long>, c10::TensorOptions, c10::optional<c10::MemoryFormat>) + 0x134 (0x7fd11eaaa974 in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/cutensor/torch/binding.cpython-39-x86_64-linux-gnu.so)
frame #8: torch::empty(c10::ArrayRef<long>, c10::TensorOptions, c10::optional<c10::MemoryFormat>) + 0xd2 (0x7fd11eaaf172 in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/cutensor/torch/binding.cpython-39-x86_64-linux-gnu.so)
frame #9: einsum(std::string, at::Tensor, at::Tensor, bool, bool) + 0xe06 (0x7fd11eaa87d6 in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/cutensor/torch/binding.cpython-39-x86_64-linux-gnu.so)
frame #10: <unknown function> + 0xfc66 (0x7fd11eaadc66 in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/cutensor/torch/binding.cpython-39-x86_64-linux-gnu.so)
frame #11: <unknown function> + 0x19613 (0x7fd11eab7613 in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/cutensor/torch/binding.cpython-39-x86_64-linux-gnu.so)
<omitting python frames>
frame #22: torch::autograd::PyNode::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x193 (0x7fd28cd08b53 in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #23: <unknown function> + 0x3896817 (0x7fd20cc9c817 in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #24: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x145b (0x7fd20cc97a7b in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #25: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x57a (0x7fd20cc987aa in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #26: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x89 (0x7fd20cc901c9 in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #27: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x5c (0x7fd28cd032bc in /home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #28: <unknown function> + 0xd6de4 (0x7fd28db9ede4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #29: <unknown function> + 0x8609 (0x7fd28f767609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #30: clone + 0x43 (0x7fd28f68c133 in /lib/x86_64-linux-gnu/libc.so.6)


======================================================================
ERROR: test_einsum_general_equivalent_results_2_test_2 (__main__.EinsumTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/home/xxx/download/CUDALibrarySamples/cuTENSOR/python/cutensor/torch/einsum_test.py", line 227, in test_einsum_general_equivalent_results
    cutensor_rslt.backward(torch.ones_like(cutensor_rslt))
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/autograd/function.py", line 199, in apply
    return user_fn(self, *args)
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/cutensor/torch/einsum.py", line 70, in backward
    d_input_0 = einsum(modeC + ',' + modeB + '->' + modeA, grad_output,
RuntimeError: cutensor: Launch failed.

======================================================================
ERROR: test_einsum_general_equivalent_results_3_test_3 (__main__.EinsumTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/home/xxx/download/CUDALibrarySamples/cuTENSOR/python/cutensor/torch/einsum_test.py", line 227, in test_einsum_general_equivalent_results
    cutensor_rslt.backward(torch.ones_like(cutensor_rslt))
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/autograd/function.py", line 199, in apply
    return user_fn(self, *args)
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/cutensor/torch/einsum.py", line 70, in backward
    d_input_0 = einsum(modeC + ',' + modeB + '->' + modeA, grad_output,
RuntimeError: cutensor: Launch failed.

======================================================================
ERROR: test_einsum_general_equivalent_results_4_test_3 (__main__.EinsumTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/home/xxx/download/CUDALibrarySamples/cuTENSOR/python/cutensor/torch/einsum_test.py", line 227, in test_einsum_general_equivalent_results
    cutensor_rslt.backward(torch.ones_like(cutensor_rslt))
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/autograd/function.py", line 199, in apply
    return user_fn(self, *args)
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/cutensor/torch/einsum.py", line 70, in backward
    d_input_0 = einsum(modeC + ',' + modeB + '->' + modeA, grad_output,
RuntimeError: cutensor: Launch failed.

======================================================================
FAIL: test_einsum_equivalent_results_7_test_6 (__main__.EinsumTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/home/xxx/download/CUDALibrarySamples/cuTENSOR/python/cutensor/torch/einsum_test.py", line 162, in test_einsum_equivalent_results
    torch.testing.assert_allclose(cutensor_rslt, torch_rslt, rtol=5e-3, atol=6e-3)
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/testing/_deprecated.py", line 78, in assert_allclose
    torch.testing.assert_close(
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/testing/_asserts.py", line 971, in assert_close
    raise error_meta.to_error()
AssertionError: Tensor-likes are not close!

Mismatched elements: 138 / 125000 (0.1%)
Greatest absolute difference: 0.015625 at index (19, 9, 9) (up to 0.006 allowed)
Greatest relative difference: 10.6752655538695 at index (18, 9, 34) (up to 0.005 allowed)

======================================================================
FAIL: test_einsum_equivalent_results_8_test_7 (__main__.EinsumTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/home/xxx/download/CUDALibrarySamples/cuTENSOR/python/cutensor/torch/einsum_test.py", line 162, in test_einsum_equivalent_results
    torch.testing.assert_allclose(cutensor_rslt, torch_rslt, rtol=5e-3, atol=6e-3)
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/testing/_deprecated.py", line 78, in assert_allclose
    torch.testing.assert_close(
  File "/home/xxx/miniconda3/envs/py39tf27pt110/lib/python3.9/site-packages/torch/testing/_asserts.py", line 971, in assert_close
    raise error_meta.to_error()
AssertionError: Tensor-likes are not close!

Mismatched elements: 104 / 125000 (0.1%)
Greatest absolute difference: 0.015625 at index (11, 11, 28) (up to 0.006 allowed)
Greatest relative difference: 1.9678456591639872 at index (35, 16, 24) (up to 0.005 allowed)

----------------------------------------------------------------------
Ran 16 tests in 34.551s

FAILED (failures=2, errors=4)


iyupan avatar Aug 18 '22 14:08 iyupan

Hi @iyupan, this issue should be fixed now.

v0i0 avatar Jul 25 '23 20:07 v0i0

Hi @iyupan, this issue should be fixed now.

Wow thanks for your hard work! ^_^

iyupan avatar Jul 27 '23 03:07 iyupan