RuntimeError: The size of tensor a (4) must match the size of tensor b (400) at non-singleton dimension 1
I encountered this bug with the latest code.
DEIM/engine/deim/hybrid_encoder.py", line 243, in with_pos_embed
[rank0]: return tensor if pos_embed is None else tensor + pos_embed
RuntimeError: The size of tensor a (4) must match the size of tensor b (400) at non-singleton dimension 1
It occurs for me when modifying the input size during training and then running inference with my finetuned model, here is my changes to have it working
diff --git a/tools/inference/torch_inf.py b/tools/inference/torch_inf.py
index 5103ad8..4aa5c73 100644
--- a/tools/inference/torch_inf.py
+++ b/tools/inference/torch_inf.py
@@ -33,13 +33,13 @@ def draw(images, labels, boxes, scores, thrh=0.4):
im.save('torch_results.jpg')
-def process_image(model, device, file_path):
+def process_image(model, device, file_path, input_size):
im_pil = Image.open(file_path).convert('RGB')
w, h = im_pil.size
orig_size = torch.tensor([[w, h]]).to(device)
transforms = T.Compose([
- T.Resize((640, 640)),
+ T.Resize(input_size),
T.ToTensor(),
])
im_data = transforms(im_pil).unsqueeze(0).to(device)
@@ -50,7 +50,7 @@ def process_image(model, device, file_path):
draw([im_pil], labels, boxes, scores)
-def process_video(model, device, file_path):
+def process_video(model, device, file_path, input_size):
cap = cv2.VideoCapture(file_path)
# Get video properties
@@ -63,7 +63,7 @@ def process_video(model, device, file_path):
out = cv2.VideoWriter('torch_results.mp4', fourcc, fps, (orig_w, orig_h))
transforms = T.Compose([
- T.Resize((640, 640)),
+ T.Resize(input_size),
T.ToTensor(),
])
@@ -140,11 +140,11 @@ def main(args):
file_path = args.input
if os.path.splitext(file_path)[-1].lower() in ['.jpg', '.jpeg', '.png', '.bmp']:
# Process as image
- process_image(model, device, file_path)
+ process_image(model, device, file_path, input_size=cfg.global_cfg["eval_spatial_size"])
print("Image processing complete.")
else:
# Process as video
- process_video(model, device, file_path)
+ process_video(model, device, file_path, input_size=cfg.global_cfg["eval_spatial_size"])
if __name__ == '__main__':
Hope it helps
I get the same error after trying to adjust the input size and finetune the network. Is there more code that needs to be modified to change input size?
I encountered this error during training. Did you finally solve this problem?
I wrote a wrapper package as a fork of DEIM to make it easier to finetune on custom datasets. One liner installation and use Python to configure models instead of config files.
Check it out - https://github.com/dnth/DEIMKit
I've successfullly trained on various custom dataset. But I've yet to verify if the results are good in comparison with other models like YOLO, RT-DETR, etc
Thanks, I'll try that out!