wall-clock time / GPU-hours to reproduce (GRPO on Qwen2.5-3B/7B)

Open mrT333 opened this issue 5 months ago • 1 comments

Hi, thanks for releasing the code and paper!

Could you share a ballpark training duration so we can better budget the costs that would be necessary to reproduce those results?

From the paper I see you used 4 × A100 (80 GB), and Graph-R1 was trained with GRPO for 3 epochs, batch size 128, max length 4096 (per Appendix G). Also for step 4 in the README of this repo:

“Run GRPO/REINFORCE++/PPO training with Qwen2.5-3B-Instruct (Need 4 × 48 GB GPUs)”

roughly how long should we expect on 4 × 48 GB GPUs (e.g., A40s on Runpod) to reach your reported results on e.g. HotpotQA? Is this closer to a few hours, tens of GPU-hours, or hundreds of GPU-hours?

If possible, any of the following would help a ton:

Wall-clock time per epoch (and total) on your 4 × A100 80 GB setup for Qwen2.5-3B and 7B
Throughput you observed (tokens/sec or samples/sec), plus grad-accum and precision settings
Approx # of training steps actually run (did you early-stop before 3 epochs or go beyond?)
Any dataset-specific runtime notes for HotpotQA or any other dataset (e.g., typical effective sequence lengths)

Totally fine to answer at order-of-magnitude level, just trying to size the experiment budget. Thanks!

Aug 18 '25 08:08 mrT333

Thanks for your interest and support!

As stated in step 4 of the codebase, all experiments were conducted under the same training setup across all datasets. The total training time was approximately half a day (around 12 hours) on 4× A100 (80GB) GPUs.

Aug 18 '25 08:08 LHRLAB