lightning-thunder icon indicating copy to clipboard operation
lightning-thunder copied to clipboard

Automated benchmarking and reporting

Open riccardofelluga opened this issue 1 year ago • 5 comments

🚀 Feature

I would like to have automated benchmarks for selected models to allow for performance tracking.

Work items

Automated benchmarking in this context means two things:

  • [ ] #224
  • [ ] #226

The benchmarking suite should enable developers to write scripts and report metrics. More specifically the automation here means that the benchmarking suite is able to be ran from CI and create a summary of the benchmark results. As an output I would like to have an easy to read summary highlighting differences in metrics such as iteration time or memory usage.

cc @crcrpar

riccardofelluga avatar Apr 15 '24 14:04 riccardofelluga

Automation is always interesting. Could you please expand on what you mean by "automated" specifically? What are the manual steps that you'd like to see automated?

IvanYashchuk avatar Apr 15 '24 17:04 IvanYashchuk

Sure! I've updated the description with more info

riccardofelluga avatar Apr 16 '24 08:04 riccardofelluga

How is this different from running

pytest thunder/benchmarks/targets.py
python thunder/benchmarks/distributed.py

, which we already have running nightly and benchmark data collected?

cc @crcrpar @tfogal

xwang233 avatar May 02 '24 19:05 xwang233

@riccardofelluga, can you answer the question above from Xiao? Providing detailed information on what's on your mind and what you would like to achieve would be very helpful here.

IvanYashchuk avatar May 07 '24 11:05 IvanYashchuk

@IvanYashchuk At the moment we are still in pre- design review phase so the ideas around this issue are being consolidated. I would have preferred to reply when there is a more concrete idea/proposal. For the time being, I am sorry for the late reply @xwang233. In the current state there are some benchmarks that at the moment are not being reported yet so one of the objective of this issue is to add those. Another objective is precisely to explore what is being benchmarked and what not and then take action based on that. I will add more comments once the OKR is sorted out with more information.

riccardofelluga avatar May 07 '24 12:05 riccardofelluga