When I using san model to train my dataset, ValueError: matrix contains invalid numeric entries
Please can you solve this?
me too,in mmsegmentation-main/mmseg/models/assigners/hungarian_assigner.py, scores of pred_instances is: scores: tensor([[ nan, nan, nan, ..., nan, nan, 0.4062], [ nan, nan, nan, ..., nan, nan, 0.3939], [ nan, nan, nan, ..., nan, nan, 0.4263], ..., [ nan, nan, nan, ..., nan, nan, 0.4180], [ nan, nan, nan, ..., nan, nan, 0.4103], [ nan, nan, nan, ..., nan, nan, 0.4028]], device='cuda:0', grad_fn=<SelectBackward0>) its amazing.
Please can you solve this?
Hey, listen, I fixed the problem and got the code running, but I don't know why. The method has two steps, as follows:
- first, you can't use SAN weights training files (https://github.com/MendelXu/SAN?tab=readme-ov-file), You must use openmmlab provide preliminary training weights (https://download.openmmlab.com/mmsegmentation/v0.5/san/clip_vit-base-patch16-224_3rdparty-d08f888 7.pth) After downloading, configure the path to your pretrained parameter.
- This step is the most bizarre and incomprehensible. After the previous step, the model will no longer output nan value, but the following calculation of loss value will result in incorrect shape inconsistency. To avoid errors, you must delete the class_weight in the first loss function configuration in configs/base/models/san_vit-b16.py. Then your model works, and from what I can see, the training process and results are not fatal (just slightly less accurate).XD
I am puzzled and hope that the passing big man can answer it.