DEIM icon indicating copy to clipboard operation
DEIM copied to clipboard

Can I perform multi-GPU training? Why does the training stop early when I set the total number of epochs to 300 with 4 GPUs?"

Open xiarencunzhang opened this issue 10 months ago • 1 comments

Can I perform multi-GPU training? Why does the training stop early when I set the total number of epochs to 300 with 4 GPUs?"

xiarencunzhang avatar Mar 19 '25 12:03 xiarencunzhang

Hi ! I am also training DEIM in Multi-GPU setup (4x Nvidia T4). When you say that the training stops early, are you getting the following error ? If yes, did you managed to fix it ?

[rank1]: File "DEIM/src/deimkit/engine/deim/box_ops.py", line 53, in generalized_box_iou
[rank1]:     assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
[rank1]: AssertionError

Moreover, the comments in the file box_ops.py says:

# degenerate boxes gives inf / nan results
# so do an early check

So I believe that the failing assert cannot be removed.

EnriqueGlv avatar Mar 25 '25 09:03 EnriqueGlv