TorchMetrics for higher reproducibility !
Dear @RJT1990,
Awesome project there !!!
I looked internally and I have seem metrics being manually implemented without any testing.
This makes me pretty scary in term of reproducibility and accurate reporting.
I think you should consider using https://github.com/PytorchLightning/metrics as the tool for benchmarking the runs.
There are extremely well tested metrics which works automatically in distributed settings and plain PyTorch.
Best, T.C
I smell @tchaton is volunteering to make it for you guys :rabbit:
Heya,
As discussed yesterday, we are not maintaining sotabench (and associated tools) at this stage, and our focus is elsewhere - particularly on lighter forms of capturing results for the main Papers with Code website.
On testing: this was an experimental product. As such, the emphasis was on extracting user signal rather than committing wholly to a particular implementation. I.e. "manual implementation" was sufficient for our objectives at the time :).
Thanks!
Ross