vision_transformers icon indicating copy to clipboard operation
vision_transformers copied to clipboard

RuntimeError: CUDA error: device-side assert triggered

Open kawaiiGTR opened this issue 1 year ago • 5 comments

Trying to run DETR on custom dataset. When executing the launch command:

  • python tools/train_detector.py --epochs 20 --batch 2 --data data/aquarium.yaml --model detr_resnet50 --name detr_resnet50

The output is:

RuntimeError: CUDA error: device-side assert triggered

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

After some digging the error either points to class mismatch or problematic activation function. Either way it doesn't seem to work!! :(

Any advise on how to get this working? Cheers!

kawaiiGTR avatar May 29 '24 18:05 kawaiiGTR

Hello @kawaiiGTR Can you please let me know how many classes you have and if you can provide the your custom dataset YAML file information here?

sovit-123 avatar May 30 '24 00:05 sovit-123

I think there is a mismatch in Linear out between Input shape [2,100,92] and the output shape [2.100,114]

kawaiiGTR avatar May 30 '24 10:05 kawaiiGTR

Were you able to solve it?

sovit-123 avatar May 30 '24 14:05 sovit-123

I don't know where e.g. what file I need to edit to change the size of the input layer to match the output layer. Could you kindly advise? I have 114 classes not 92.. Cheers!

kawaiiGTR avatar May 30 '24 22:05 kawaiiGTR

Is it possible for you to provide me link to the dataset? I will be able to debug it if I have the dataset.

sovit-123 avatar May 31 '24 00:05 sovit-123