DEIM Unable to reproduce result reported in paper

I trained the DEIM-D-FINE-X model on 8 A100 GPUs (40GB each), and at epoch 50, I see AP [0.5:0.95] to be 55.8 mAP, while the paper reported 56.5 mAP (Table 1). Why is this discrepancy? Did anyone else notice this?

Also, the training got stopped at 50 epochs with following error:

[rank2]: Traceback (most recent call last):
[rank2]:   File "<path>/DEIM/train.py", line 84, in <module>
[rank2]:     main(args)
[rank2]:   File "<path>/DEIM/train.py", line 54, in main
[rank2]:     solver.fit()
[rank2]:   File "<path>/DEIM/engine/solver/det_solver.py", line 72, in fit
[rank2]:     self.load_resume_state(str(self.output_dir / 'best_stg1.pth'))
[rank2]:   File "<path>/DEIM/engine/solver/_solver.py", line 159, in load_resume_state
[rank2]:     state = torch.load(path, map_location='cpu')
[rank2]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/opt/conda/envs/deim_p312/lib/python3.12/site-packages/torch/serialization.py", line 1384, in load
[rank2]:     return _legacy_load(
[rank2]:            ^^^^^^^^^^^^^
[rank2]:   File "/opt/conda/envs/deim_p312/lib/python3.12/site-packages/torch/serialization.py", line 1628, in _legacy_load
[rank2]:     magic_number = pickle_module.load(f, **pickle_load_args)
[rank2]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: EOFError: Ran out of input
[rank5]: Traceback (most recent call last):
[rank5]:   File "<path>/DEIM/train.py", line 84, in <module>
[rank5]:     main(args)
[rank5]:   File "<path>/DEIM/train.py", line 54, in main
[rank5]:     solver.fit()
[rank5]:   File "<path>/DEIM/engine/solver/det_solver.py", line 72, in fit
[rank5]:     self.load_resume_state(str(self.output_dir / 'best_stg1.pth'))
[rank5]:   File "<path>/DEIM/engine/solver/_solver.py", line 159, in load_resume_state
[rank5]:     state = torch.load(path, map_location='cpu')
[rank5]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank5]:   File "/opt/conda/envs/deim_p312/lib/python3.12/site-packages/torch/serialization.py", line 1360, in load
[rank5]:     return _load(
[rank5]:            ^^^^^^
[rank5]:   File "/opt/conda/envs/deim_p312/lib/python3.12/site-packages/torch/serialization.py", line 1848, in _load
[rank5]:     result = unpickler.load()
[rank5]:              ^^^^^^^^^^^^^^^^
[rank5]:   File "/opt/conda/envs/deim_p312/lib/python3.12/site-packages/torch/serialization.py", line 1812, in persistent_load
[rank5]:     typed_storage = load_tensor(
[rank5]:                     ^^^^^^^^^^^^
[rank5]:   File "/opt/conda/envs/deim_p312/lib/python3.12/site-packages/torch/serialization.py", line 1772, in load_tensor
[rank5]:     zip_file.get_storage_from_record(name, numel, torch.UntypedStorage)
[rank5]: RuntimeError: PytorchStreamReader failed reading file data/37: file read failed

I suspect this is because of the following config in configs/base/deim.yml:

collate_fn:
    mixup_prob: 0.5
    mixup_epochs: [4, 29]
    stop_epoch: 50    # epoch in [72, ~) stop `multiscales`

But not sure how to fix this yet. Any suggestions?

Aug 22 '25 18:08 csampat-a

trained on linux?I encountered the same problem when training DEIM(deim_hgnetv2_s_coco.yml) with custom dataset on Windows 11.I tried to solve this problem by adding 1 to your dataset classes(in configs/dataset/custom_detection.yml). For lower map,i think authors trained multiple times and selected the best map.

Aug 26 '25 13:08 xianbeisukisu

Hi @csampat-a Take a look here https://github.com/Intellindust-AI-Lab/DEIM/blob/bc11dfefc08d79756508c7f8b56c29feb909a4f0/configs/deim_dfine/deim_hgnetv2_x_coco.yml#L22-L37 Technically, to reproduce thre exact same results as in the paper you should trained first for 50 epoch with augmentation, and then an additional 8 epochs for optimal ema sear

Aug 26 '25 16:08 SebastianJanampa

@xianbeisukisu Am seeing the above error at 50th epoch. I guess that's when the training run is trying to load the best_stg1.pth checkpoint. I was able to resume training from the 50th epoch by re-starting training with -resume.

@SebastianJanampa From Table 1 of the paper, it is mentioned that the performance is achieved for 50 epochs. After training for 58 epochs, I was able to get to 56.16 (using 8 A100 GPUs). I wonder if the numbers reported in the paper were with a different training setup. I probably need to adjust LR for my batch size.

Aug 28 '25 16:08 csampat-a

@csampat-a ,

Did you use the configurations provided by this repo? Do not worry about the slight difference (0.3 points). That's normal. If you trained it again, you would get a score different from 56.16 by a small margin. If your goal is to submit a paper, you should use the official reported results in the DEIM paper.

Aug 28 '25 17:08 SebastianJanampa

Thank you very much for your interest in and attention to our work. You should strictly follow the official configuration — for example, DEIM-D-FINE-X should be set to 50 + 8 epochs. The extra 8 epochs are introduced in D-FINE for performing a better decay search. Meanwhile, it’s normal to observe small fluctuations within ±0.1 AP.

Nov 01 '25 00:11 ShihuaHuang95