kronos icon indicating copy to clipboard operation
kronos copied to clipboard

Bug report: stale register value in case of double write

Open flaviens opened this issue 2 years ago • 2 comments

Hi there!

I found a bug in Kronos that in some cases ignores the second consecutive write to a given register in case of a WAW dependency. A problematic example is the following:

  • The first instruction writes to, say, ra, for example add ra, t0, zero.
  • The second instruction writes to ra again and has no dependency with the previous instruction, for example add ra, t1, zero, where we assume t0 != t1.
  • The third instruction depends on ra, say add t2, ra, zero. Then, given the behavior of the current HCU, will not stall because it sees a write-back to ra. However, this is the write-back from the first instruction and forwards stale data. The final state of this scenario is t2=t0 instead of the expected t2=t1.

Kind regards, Flavien

flaviens avatar Feb 23 '23 16:02 flaviens

@flaviens, thanks for digging into this. I'm off the grid until March 2nd. I'll need to parse and issue the study your fix, once I'm back. If all seems well, I'll pull your fix.

SonalPinto avatar Feb 24 '23 14:02 SonalPinto

Sounds good! Some more info to help you: the bug is essentially caused by the misuse of the written-back register ID to identify whether the write-back belongs to the instruction currently in the EX stage, if my interpretation of this equality test is correct: https://github.com/SonalPinto/kronos/blob/13678d47ccae4f889690ec1e0ba648299c7027d8/rtl/core/kronos_hcu.sv#L89 This assumption is incorrect if the instruction preceding the current instruction in the EX stage writes to the same destination register. In that case, regwr_pending is unset after the first EX cycle, but in fact, there is still a register write pending for that instruction in the EX stage. The effect of unsetting regwr_pending is that the control logic misses the information that a register hazard happens. The solution that I suggested in the PR is to rely on the fact that a write-back happening in the first cycle in which an instruction resides in the EX stage cannot belong to that instruction in the EX stage (because it requires at least one cycle to generate the EX result). I also suggest in the PR moving the logic for computing regwr_pending into the EX stage, as it feels more understandable. I remain at your disposal!

flaviens avatar Feb 24 '23 20:02 flaviens