MIOpen Error: /MIOpen/src/tensor.cpp:67: Invalid length. Length must be greater than 0
Hi, While running an LSTM-based model using rocm I get the following error, while with CUDA on NVIDIA GPU, it works fine. I checked the size of the tensor is not 0 (i.e., the size is torch.Size([2000, 1536, 128]). Do we not have support for it in ROCm at the moment?
MIOpen Error: /MIOpen/src/tensor.cpp:67: Invalid length. Length must be greater than 0.
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File " ~/tools/ont/bonito/multiprocessing.py", line 110, in run
for item in self.iterator:
File " ~/tools/ont/bonito/crf/basecall.py", line 69, in
@singagan Please provide clear reproduction instructions of your problem. ROCm version, base OS version, GPU type, which software needs to be installed etc. Optimal way is providing a docker image so our engineers are able to quickly reproduce the issue without affecting their test systems.
Without the above we will be unable to help you.
/CC @junliume @JehandadKhan @shurale-nkn
@junliume https://github.com/ROCmSoftwarePlatform/MIOpen/labels/ON_HOLD
@atamazov Thanks for the quick reply. I will try to share a docker. Meanwhile, I am attaching the log I generated using env MIOPEN_ENABLE_LOGGING run.log
If I reduce the batch size from 1536 to 768, then I do not see this error. There seems to be some issue with the buffer sizing
HIP version: 5.4.22804-474e8620 AMD clang version 15.0.0 OS: Ubuntu 20.04.3 LTS (Focal Fossa) GPU: Mi210
@shurale-nkn could you also take a look at the log above?
@singagan Thanks for providing the log and for additional information about batch size, but I am afraid this is not enough for us to understand what happens. If you can provide logs with more info, then we can try again (but docker + repro instructions is be better of course). Recommended env settings for generating log:
export MIOPEN_ENABLE_LOGGING=1 ;\
export MIOPEN_ENABLE_LOGGING_CMD=1 ;\
export MIOPEN_LOG_LEVEL=7 ;\
export MIOPEN_ENABLE_LOGGING_MPMT=1
Note: log will be huge.
@shurale-nkn could you also take a look at the log above?
A month ago I discussed with singagan in chat about this problem. After the requested logs with MIOPEN_ENABLE_LOGGING=1 were passed to me, I offered two options for how Gagandeep can quickly workaround this problem: decrease batch_size or seq_length, and run this operation twice, Because in this configuration MIOpen have to work with buffer size larger than INT type and our implementation can't work with it.
@singagan Has this issue been resolved for you? If so, please close ticket. Thanks!