In prog: harris_sch2_fourbuf compute unit: hcompute_lgxx_stencil_1 has a mismatch between C++ and coreir
@jeffsetter another compute unit inconsistency in a different harris variant:
ERROR in compute unit: hcompute_lgxx_stencil_1
in0_lgxx_stencil[0] -> 236
in0_lgxx_stencil[0] -> 0000000011101100
in1_padded16_global_wrapper_stencil[0] -> 41
in1_padded16_global_wrapper_stencil[0] -> 0000000000101001
in1_padded16_global_wrapper_stencil[1] -> 205
in1_padded16_global_wrapper_stencil[1] -> 0000000011001101
in1_padded16_global_wrapper_stencil[2] -> 186
in1_padded16_global_wrapper_stencil[2] -> 0000000010111010
in1_padded16_global_wrapper_stencil[3] -> 171
in1_padded16_global_wrapper_stencil[3] -> 0000000010101011
in1_padded16_global_wrapper_stencil[4] -> 242
in1_padded16_global_wrapper_stencil[4] -> 0000000011110010
in1_padded16_global_wrapper_stencil[5] -> 251
in1_padded16_global_wrapper_stencil[5] -> 0000000011111011
coreir_result: 744
cpp_result : 232
Could you please reproduce this on your end and fix it?
@jeffsetter the testbench is here: https://github.com/dillonhuff/clockwork/tree/master/compute_unit_regressions/harris_sch2_fourbuf/hcompute_lgxx_stencil_1
I think I figured out the issue with the compute unit, which has to do with signed numbers and shifting. For this application, I never intended numbers to overflow during the multiplication, so I will change the clamp from 255 to 180.
Regarding your testbench, the cpp result is accounting for the overflow, while your coreir result is not. I believe the verilog is incorrect, resulting in the bitshift occurring on an unsigned rather than a signed number. To fix this, I changed the output assignment to:
assign out_lgxx_stencil = 16'($signed(in0_lgxx_stencil[0]) + (($signed(smax_289_290_291_out * smax_289_290_291_out)) >>> 16'h0007));
Note the $signed cast on the input.
Another solution could be using wire signed when applicable.
@jeffsetter that makes sense. LMK when you have updated the compute units and I'll retry the regression.
To clarify, the reason why it is failing right now is that the verilog generation is not correct. I would fix that first before we change the clamp values. I suspect this is also why sch1 is not working.
@jeffsetter I cannot fix the verilog generation since that is done by coreir.
@rdaly525 can you take a look at this example?
@dillonhuff Just ran the testbench locally and I cannot reproduce. The randomized verilator test seems to pass
Also the generated coreir and verilog visually looks correct to me.