Question About Dataset Construction Cost
Hi, thank you for sharing your impressive work on Graph-R1! I was impressed by the reported efficiency for constructing the graph. Could you share an estimate of the total cost involved in building the full dataset used in your experiments? If someone wanted to replicate this work or apply it in another domain, what kind of budget range would you recommend? Thanks again for your great work—looking forward to your insights!
Hi, thanks for your interest! Our corpus per dataset was sampled to about 1M tokens, so the cost is roughly unit price (shown as 2.81 $ per 1 M) × 1M. For replication, usually a few USD is enough, and for larger domains you can scale linearly by token size.
Thanks for the timely response! That helps a lot.