finn
finn copied to clipboard
Throughput and Latency reports from RTL sim
Hello, everyone!
It may not be a real issue, but I would like to call attention to this anyways. Currently, the performance report is generated with a single call to throughput_test_rtlsim. However, for batch sizes greater than one, the latency_cycles will be wrong, as I understand (because latency_cycles gets assigned from cycles).
In step_measure_rtlsim_performance inside src/builder/build_dataflow_steps.py, we have:
rtlsim_bs = int(cfg.rtlsim_batch_size) # which defaults to 1
rtlsim_perf_dict = throughput_test_rtlsim(rtlsim_model, rtlsim_bs)
rtlsim_latency = rtlsim_perf_dict["cycles"]
rtlsim_perf_dict["latency_cycles"] = rtlsim_latency
where, I believe, it should be something like:
rtlsim_bs = len(rtlsim_model.graph.node)
rtl_single_run = throughput_test_rtlsim(rtlsim_model,1)
rtlsim_perf_dict = throughput_test_rtlsim(rtlsim_model, rtlsim_bs) # or even, just saving the cycle in which the first output is produced - not requiring calling throughput_test_rtlsim twice
rtlsim_perf_dict["latency_cycles"] = rtl_single_run["cycles"]
So, in the end, we have "latency_cycles" giving the number of cycles for a single input (i.e., latency) and "cycles" giving the number of cycles for processing a batch large enough to assess the throughput with the pipeline filled.
Thanks,