ay I ask why this error often occurs when I encounter ruptured aneurysm data while running with your code:an illegal memory access was encountered

Open Fjch-Ryan opened this issue 1 year ago • 1 comments

Below is the error message： 2024-08-17 01:24:07 [MainThread] INFO [TaskAneurysmSegTrainer] - (Time per iter: 0.92s)train iter 3010. epoch 1/200. total_loss: 0.2283 local_loss: 0.1790 global_loss: 0.0493 ap: 0.3750 auc: 0.7029 precision: 0.5000 recall: 0.5000 dsc: 0.5000 hd95: 1.0000 per_target_precision: 0.5000 per_target_recall: 0.0000 meta('170', '170') target.shape：torch.Size([2, 96, 96, 96]) /home/dluser/anaconda3/envs/CCC/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:149: UserWarning: The epoch parameter in scheduler.step() was not necessary and is being deprecated where possible. Please use scheduler.step() to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose. warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning) 2024-08-17 01:24:08 [MainThread] ERROR [TaskAneurysmSegTrainer] - CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Aug 17 '24 03:08 Fjch-Ryan

I'm not sure why this happens. It might be some illegal operation or memory leak. You should debug and locate the error code with "CUDA_LAUNCH_BLOCKING=1".

Aug 19 '24 02:08 MeteorsHub