How long does it take to evaluate on MS-MARCO?
Evaluating on MS-MARCO seems to take significantly a lot more time than NQ or Hotpot QA, i.e., it just hangs there:
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|█████ | 1/2 [00:27<00:27, 27.73s/it]
I was wondering if the authors (or anyone else running the repo) encountered similar issues? Can this be resolved by waiting things out?
Hi, thanks for pointing it out. How is everything going now? Did it be resolved? Based on the information you provided I think that you stuck in loading the model checkpoint, which is not related to MS-MARCO dataset.
No, unfortunately the problem persists. I made several attempts, and every time it ran for more than 10 hours on an A6000 and terminated while still loading (as shown above).
It looks strange... Does NQ or HotpotQA work well? Just in MS-MARCO you have this issue?
Yes, NQ and HotpotQA both finished running. Only MS-MARCO didn't work.
Hi, I replicated MS-MARCO experiments and it runs well. My device is a single A100 with 80GB VRAM, and 1TB RAM. Maybe the issue comes from the larger memory usage of MS-MARCO.
Yes that might be the case. Thank you for looking into this!
I do not have access to A100 so I cannot test it out. Maybe someone else working with MS-MARCO could verify?
Yeah I think we could leave this issue here to see if someone can help.