mmsegmentation icon indicating copy to clipboard operation
mmsegmentation copied to clipboard

When I using san model to train my dataset, ValueError: matrix contains invalid numeric entries

Open 641346231 opened this issue 1 year ago • 3 comments

74d1e01bd23545ecc0245ec6d897388

641346231 avatar Aug 12 '24 12:08 641346231

Please can you solve this?

shenxiangkei avatar Nov 30 '24 09:11 shenxiangkei

me too,in mmsegmentation-main/mmseg/models/assigners/hungarian_assigner.py, scores of pred_instances is: scores: tensor([[ nan, nan, nan, ..., nan, nan, 0.4062], [ nan, nan, nan, ..., nan, nan, 0.3939], [ nan, nan, nan, ..., nan, nan, 0.4263], ..., [ nan, nan, nan, ..., nan, nan, 0.4180], [ nan, nan, nan, ..., nan, nan, 0.4103], [ nan, nan, nan, ..., nan, nan, 0.4028]], device='cuda:0', grad_fn=<SelectBackward0>) its amazing.

ymyc avatar Mar 22 '25 09:03 ymyc

Please can you solve this?

Hey, listen, I fixed the problem and got the code running, but I don't know why. The method has two steps, as follows:

  1. first, you can't use SAN weights training files (https://github.com/MendelXu/SAN?tab=readme-ov-file), You must use openmmlab provide preliminary training weights (https://download.openmmlab.com/mmsegmentation/v0.5/san/clip_vit-base-patch16-224_3rdparty-d08f888 7.pth) After downloading, configure the path to your pretrained parameter.
  2. This step is the most bizarre and incomprehensible. After the previous step, the model will no longer output nan value, but the following calculation of loss value will result in incorrect shape inconsistency. To avoid errors, you must delete the class_weight in the first loss function configuration in configs/base/models/san_vit-b16.py. Then your model works, and from what I can see, the training process and results are not fatal (just slightly less accurate).XD

I am puzzled and hope that the passing big man can answer it.

ymyc avatar Mar 23 '25 03:03 ymyc