mxnet icon indicating copy to clipboard operation
mxnet copied to clipboard

Floating point exception in mxnet.ndarray.op.SequenceReverse

Open leeyeetonn opened this issue 5 years ago • 4 comments

Description

(A clear and concise description of what the bug is.) mxnet.ndarray.op.SequenceReverse has floating point exception when given data has 0 in its shape. Please see the provided code for example.

Error Message

(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=10 before running your script.)

Floating point exception (core dumped)

To Reproduce

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

import mxnet
import numpy as np
data = mxnet.nd.array(np.random.rand(0,1,1))
mxnet.ndarray.op.SequenceReverse(data)

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. run the provided code in python interpreter or as a script

What have you tried to solve it?

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here

Got 404 when trying to get the script.

Some environment information:

  • OS: ubuntu 18.04
  • Python: 3.7.6
  • pip: 20.0.2
  • numpy: 1.18.5
  • mxnet: 1.6.0

leeyeetonn avatar Aug 16 '20 04:08 leeyeetonn

So here's the problem:

% DMLC_LOG_STACK_TRACE_DEPTH=150 MXNET_ENGINE_TYPE=NaiveEngine lldb python3.7 -- test_18940.py
(lldb) target create "python3.7"
Current executable set to 'python3.7' (x86_64).
(lldb) settings set -- target.run-args  "test_18940.py"
(lldb) run
Process 78879 launched: '/usr/local/bin/python3.7' (x86_64)
Process 78879 stopped
* thread #2, stop reason = exec
    frame #0: 0x0000000100006000 dyld`_dyld_start
dyld`_dyld_start:
->  0x100006000 <+0>: popq   %rdi
    0x100006001 <+1>: pushq  $0x0
    0x100006003 <+3>: movq   %rsp, %rbp
    0x100006006 <+6>: andq   $-0x10, %rsp
(lldb) cont
Process 78879 resuming
[23:52:34] ../src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
Process 78879 stopped
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_ARITHMETIC (code=EXC_I386_DIV, subcode=0x0)
    frame #0: 0x0000000115861a13 libmxnet.dylib`mxnet::op::SequenceReverseOp<mshadow::cpu, float, float>::Forward(this=0x0000000123da0db0, ctx=0x00007ffeefbfc2b0, in_data=0x0000000127b7a2f8, req=0x00007ffeefbfc310, out_data=0x0000000127b7a340, aux_args=0x0000000127b7a328) at sequence_reverse-inl.h:139
   136 	    auto max_seq_len = in_data[seq_reverse::kData].size(0);
   137 	    auto n = in_data[seq_reverse::kData].size(1);
   138 	    auto total_size = in_data[seq_reverse::kData].Size();
-> 139 	    auto rest_dim = static_cast<int>(total_size / n / max_seq_len);
   140
   141 	    Shape<3> s3 = Shape3(max_seq_len, n, rest_dim);
   142 	    Tensor<xpu, 3, DType> data =

https://github.com/apache/incubator-mxnet/blob/9bdd4d6347c284770ee5bfe5ae98f1dabc283829/src/operator/sequence_reverse-inl.h#L139

The code needs to guard against zero-size array for right operand of /, and we should add a smoke test to guard against such problem in this op, similar to https://github.com/apache/incubator-mxnet/pull/18972/files

szha avatar Aug 21 '20 06:08 szha

Could I get this assigned??

pranjii avatar Oct 13 '21 04:10 pranjii

Hello, I see a PR merged for this issue. Is this still open?

It seems like there is an arithmetic error in the code where the right operand of the division operator is zero, causing an EXC_ARITHMETIC exception. One way to guard against this error is to add a conditional statement that checks if the right operand is zero before executing the division operation.

can i work on it if it's still open?

Pheewww avatar Feb 25 '23 16:02 Pheewww