Remove use of get_row() in hot loops
get_row() constructs an AllValues from a single row across a table of polynomials viewed as columns. When used in a hot loop, e.g. over the entire length of the circuit, this amounts to making a full copy of all polynomials (one row at a time which probably is even worse?). Its currently used in the log derivative library (used by ECCVM and AVM) and in PG. At last check the PG usage accounts for 1s x86. ECCVM usage appears negligible but could be relevant for AVM. (Edit: AVM has fixed this by introducing a version of get_row that returns const refs instead of copying data).
Note: method compute_grand_product() in grand_product_library.hpp uses a manual construction instead of get_row but I'm fairly sure that it's equivalent and should potentially be addressed.
As we were discussing in the slack thread, get_row seems to amount for 80%+ of the work done for logderivatives in the AVM. Accessing the rows directly seems to massively improve proving a public token transfer from 60s to 15s. Luke and I will chat about how to get this fixed.
Update: The AVM team solved this by introducing a version of get_row that returns const refs instead of copying data. Apparently this was as good as the more involved solution of only extracting the exact data that was needed at each step. This a great solution except that it relies on some uncoupled duplication in the Flavors that would be very brittle. (Not such a problem for AVM since their Flavors are auto-generated). A half measure solution for the log-derivative inverse use case is to still use get_row but only when the relation in question is active. This results in only calling get_row as many times as the relation is active vs at every row. get_row is still used in a hot loop in PG: compute_full_honk_evaluations(). This case could use a const ref version of get_row if one existed. This would likely result in a significant speedup.
Just looking at this - would anybody have concerns about changing the method operation_exists_at_row to take as an input argument a reference to the full set of polynomials and a size_t row_index member? That way the relation code directly can access required polynomial members without a copy being invoked. Seems like a small change?
A few comments on that
- First of all, this is not a problem anymore for the AVM since: https://github.com/AztecProtocol/aztec-packages/pull/11605 . But of course feel free to avoid or improve
get_row(as long as it's still compatible with current code). - About the idea of changing
operation_exists_at_rowto get the polys and a row_index: observe that the row resulting fromget_row()here is later used inoperation_exists_at_rowbut also when computing the read and write terms https://github.com/AztecProtocol/aztec-packages/blob/master/barretenberg/cpp/src/barretenberg/honk/proof_system/logderivative_library.hpp#L52 . Moreover, those functions are used both by the logderiv library directly, and by the accumulate function (which expects a full row as well, see same file). While you can provide a row_index for the calls coming from the logderiv library, it might be more difficult to do that foraccumulate(), unless you change its callers, and... that is a big change that might not be worth it.
@fcarreiro I was hoping that changing operation_exists_at_row would help, even if downstream fns aren't changed, because get_row would only need to be called for rows where a lookup occurs. I was thinking this would be relevant particularly for the cccvm.
However I just ran a quick test, and changing get_row to take in a row index did not appear to appreciably improve Prover times when running benchmark_client_ivc.sh, so yeah not worth.
Ah yes that makes sense! True that it doesn't seem to change a lot if you have just a handful of columns.* As you get more columns it does seem to make a huge difference, for the AVM actually it made a 500x difference as it started growing.
*I guess tweaking just operation_exists_at_row would not gain much if you expect it to return true for most rows, is that the case for the eccvm? How many columns do you expect for this flavor? Just to consider if a bigger change would speed up things, I have some in-between not so disruptive options if needed.
it actually returns false for most rows - I noticed the function calling get_row was not multithreaded though - multithreading sped things up a lot - honestly I was surprised not calling get_row for non-lookup rows didn't have much of an effect but I literally couldn't measure a difference in my tests (~10ms difference)