Scribbles - MONAI CRF raises illegal memory access error when run on a cuda device
Describe the bug
In scribbles, one of the optimisation method used is MONAI's CRF. When using this on a cuda device, it raises an illegal memory access error with message:
RuntimeError: CUDA error: an illegal memory access was encountered
To Reproduce Steps to reproduce the behavior:
- Go to https://github.com/Project-MONAI/MONAILabel/blob/ba1ced34ec449d0787d2bb0ba7f35cf5f28bf511/sample-apps/segmentation_spleen_scribbles/lib/scribbles.py#L98 and change "cpu" to "cuda"
- Run the scribbles app:
PYTHONPATH=../. python main.py start_server --app ../sample-apps/segmentation_spleen_scribbles --studies /data/KCLData/Datasets/MSDDataset/Task09_Spleen/imagesTrSmall/ - Click on Next sample to fetch a sample, then select ISeg + CRF in scribbles and click update.
Expected behavior This should run the MONAI's CRF based optimisation on GPU and return the updated label back to the client. Instead it gives an illegal memory access error. This error occurs especially if there is a training running in the background, i.e. there is limited GPU memory available.
Environment
Ensuring you use the relevant python executable, please paste the output of:
python -c 'import monai; monai.config.print_debug_info()'
================================
Printing MONAI config...
================================
MONAI version: 0.6.0+17.g830f8bb3
Numpy version: 1.21.0
Pytorch version: 1.9.0+cu111
MONAI flags: HAS_EXT = True, USE_COMPILED = False
MONAI rev id: 830f8bb34532f63e188b43738f804fff6f059d5b
Optional dependencies:
Pytorch Ignite version: 0.4.5
Nibabel version: 3.2.1
scikit-image version: 0.18.2
Pillow version: 8.3.1
Tensorboard version: 2.5.0
gdown version: 3.13.0
TorchVision version: 0.10.0+cu111
ITK version: 5.1.2
tqdm version: 4.61.2
lmdb version: 1.2.1
psutil version: 5.8.0
pandas version: NOT INSTALLED or UNKNOWN VERSION.
einops version: NOT INSTALLED or UNKNOWN VERSION.
For details about installing the optional dependencies, please visit:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
================================
Printing system config...
================================
System: Linux
Linux version: Ubuntu 20.04.2 LTS
Platform: Linux-5.11.0-25-generic-x86_64-with-glibc2.29
Processor: x86_64
Machine: x86_64
Python version: 3.8.10
Process name: python
Command: ['python', '-c', 'import monai; monai.config.print_debug_info()']
Open files: [popenfile(path='/usr/share/code/v8_context_snapshot.bin', fd=20, position=0, mode='r', flags=32768), popenfile(path='/proc/124085/statm', fd=31, position=36, mode='r', flags=32768), popenfile(path='/proc/124085/status', fd=32, position=1448, mode='r', flags=32768), popenfile(path='/usr/share/code/resources/app/node_modules.asar', fd=52, position=58252, mode='r', flags=32768), popenfile(path='/usr/share/fonts/truetype/liberation2/LiberationSerif-Regular.ttf', fd=55, position=0, mode='r', flags=32768), popenfile(path='/usr/share/code/resources/app/node_modules.asar', fd=56, position=58252, mode='r', flags=32768), popenfile(path='/usr/share/code/v8_context_snapshot.bin', fd=103, position=0, mode='r', flags=32768)]
Num physical CPUs: 6
Num logical CPUs: 12
Num usable CPUs: 12
CPU usage (%): [21.2, 19.6, 20.0, 20.0, 20.0, 20.0, 19.6, 21.6, 18.0, 20.0, 20.0, 100.0]
CPU freq. (MHz): 922
Load avg. in last 1, 5, 15 mins (%): [3.2, 5.0, 4.2]
Disk usage (%): 38.3
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 31.0
Available memory (GB): 24.6
Used memory (GB): 4.9
================================
Printing GPU config...
================================
Num GPUs: 1
Has CUDA: True
CUDA version: 11.1
cuDNN enabled: True
cuDNN version: 8005
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86']
GPU 0 Name: Quadro RTX 3000
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 30
GPU 0 Total memory (GB): 5.8
GPU 0 CUDA capability (maj.min): 7.5
Additional context This same error has been reported at MONAI core repository a while back at: https://github.com/Project-MONAI/MONAI/issues/2098 and may need to be reopened to resolve this and get the scribbles app working as expected.
For now we have removed MONAI CRF option from all scribbles apps. We may revisit this once the issue referenced in the MONAI core is looked into.
Perhaps for now this issue can be marked as blocked/not achievable for 0.2 release.
Any updates on this issue? still a problem?
It is still a problem, this may be looked into by the original MONAI contributor - however this may be at a later time, so perhaps we can remove this from 0.3 release for now and keep the issue open.
I guess we can close this issue. Feel free to reopen if the problem still exists