mxnet icon indicating copy to clipboard operation
mxnet copied to clipboard

[BUGFIX] Fix nms kernel's out of range access issue

Open TristonC opened this issue 3 years ago • 7 comments

Description

This fix the error found in a object detection case with following error

Traceback (most recent call last):
  File "test_with_network.py", line 27, in <module>
    print(arr)
  File "/opt/mxnet/python/mxnet/gluon/block.py", line 825, in __call__
    out = self.forward(*args)
  File "/opt/mxnet/python/mxnet/gluon/block.py", line 1684, in forward
    return self._call_cached_op(x, *args)
  File "/opt/mxnet/python/mxnet/gluon/block.py", line 1233, in _call_cached_op
    out = self._cached_op(*cargs)
  File "/opt/mxnet/python/mxnet/_ctypes/ndarray.py", line 148, in __call__
    check_call(_LIB.MXInvokeCachedOpEx(
  File "/opt/mxnet/python/mxnet/base.py", line 246, in check_call
    raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
  File "../include/mshadow/././././cuda/tensor_gpu-inl.cuh", line 147
Name: Check failed: err == cudaSuccess (700 vs. 0) : MapPlanKernel ErrStr:an illegal memory access was encountered
[16:13:24] ../src/resource.cc:306: Ignore CUDA Error [16:13:24] ../src/storage/././storage_manager_helpers.h:135: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA: an illegal memory access was encountered

Checklist

Essentials

  • [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • [x] Changes are complete (i.e. I finished coding on this PR)
  • [ ] All changes have test coverage
  • [x] Code is well-documented

Changes

  • [x] Remove the element_width limitation of 20 in CalculateGreedyNMSResultsKernel

Comments

  • This fix credit to Przemyslaw Tredak

TristonC avatar May 06 '22 21:05 TristonC

Hey @TristonC , Thanks for submitting the PR All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [windows-cpu, unix-gpu, centos-cpu, windows-gpu, centos-gpu, clang, unix-cpu, sanity, edge, website, miscellaneous]


Note: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged.

mxnet-bot avatar May 06 '22 21:05 mxnet-bot

@ptrendx @josephevans

TristonC avatar May 06 '22 21:05 TristonC

@mxnet-bot run ci [centos-gpu, unix-cpu, unix-gpu, windows-gpu]

TristonC avatar May 11 '22 21:05 TristonC

Jenkins CI successfully triggered : [unix-cpu, windows-gpu, centos-gpu, unix-gpu]

mxnet-bot avatar May 11 '22 21:05 mxnet-bot

This is a legitimate failure - we are using C++17 which does not need message in static_assert, but the previous versions do (and 1.x uses older C++ standard) - you need to add the message to static_assert. This is the error message:

error: expected a comma (the one-argument version of static_assert is not enabled in this mode)

ptrendx avatar May 12 '22 18:05 ptrendx

@mxnet-bot run ci [unix-gpu]

TristonC avatar May 13 '22 17:05 TristonC

Jenkins CI successfully triggered : [unix-gpu]

mxnet-bot avatar May 13 '22 17:05 mxnet-bot