Difference between compute_accumulated and compute_preloaded?
Hello,
from the documentation, they seem to do exactly the same operation for the WS dataflow. Is there a difference here that I am not seeing? Thank you!
The spatial array in Gemmini is double-buffered; you can load in new weights into one set of registers, while continuing to perform matmuls with the old weights in the other set of registers.
compute_accumulated and compute_preloaded in WS mode determine whether your new matmul will use the new weights that you just loaded in, or whether they should use the old weights. Tbh, these types of decisions probably shouldn't have been exposed to the programmer; it would have been better if the hardware had handled these kinds of decisions under-the-hood instead.
So you compute on Register A until a compute_accumulated instruction is encountered, after which you compute on Register B?
And the preload instruction will load into the currently inactive register?