clockwork In prog: harris_sch2_fourbuf compute unit: hcompute_lgxx_stencil

@jeffsetter another compute unit inconsistency in a different harris variant:

ERROR in compute unit: hcompute_lgxx_stencil_1
in0_lgxx_stencil[0] -> 236
in0_lgxx_stencil[0] -> 0000000011101100
in1_padded16_global_wrapper_stencil[0] -> 41
in1_padded16_global_wrapper_stencil[0] -> 0000000000101001
in1_padded16_global_wrapper_stencil[1] -> 205
in1_padded16_global_wrapper_stencil[1] -> 0000000011001101
in1_padded16_global_wrapper_stencil[2] -> 186
in1_padded16_global_wrapper_stencil[2] -> 0000000010111010
in1_padded16_global_wrapper_stencil[3] -> 171
in1_padded16_global_wrapper_stencil[3] -> 0000000010101011
in1_padded16_global_wrapper_stencil[4] -> 242
in1_padded16_global_wrapper_stencil[4] -> 0000000011110010
in1_padded16_global_wrapper_stencil[5] -> 251
in1_padded16_global_wrapper_stencil[5] -> 0000000011111011
	coreir_result: 744
	cpp_result   : 232

Could you please reproduce this on your end and fix it?

Nov 11 '20 22:11 dillonhuff

@jeffsetter the testbench is here: https://github.com/dillonhuff/clockwork/tree/master/compute_unit_regressions/harris_sch2_fourbuf/hcompute_lgxx_stencil_1

Nov 11 '20 22:11 dillonhuff

I think I figured out the issue with the compute unit, which has to do with signed numbers and shifting. For this application, I never intended numbers to overflow during the multiplication, so I will change the clamp from 255 to 180.

Regarding your testbench, the cpp result is accounting for the overflow, while your coreir result is not. I believe the verilog is incorrect, resulting in the bitshift occurring on an unsigned rather than a signed number. To fix this, I changed the output assignment to: assign out_lgxx_stencil = 16'($signed(in0_lgxx_stencil[0]) + (($signed(smax_289_290_291_out * smax_289_290_291_out)) >>> 16'h0007));

Note the $signed cast on the input.

Nov 12 '20 19:11 jeffsetter

Another solution could be using wire signed when applicable.

Nov 12 '20 19:11 jeffsetter

@jeffsetter that makes sense. LMK when you have updated the compute units and I'll retry the regression.

Nov 12 '20 19:11 dillonhuff

To clarify, the reason why it is failing right now is that the verilog generation is not correct. I would fix that first before we change the clamp values. I suspect this is also why sch1 is not working.

Nov 12 '20 21:11 jeffsetter

@jeffsetter I cannot fix the verilog generation since that is done by coreir.

@rdaly525 can you take a look at this example?

Nov 12 '20 22:11 dillonhuff

@dillonhuff Just ran the testbench locally and I cannot reproduce. The randomized verilator test seems to pass

Nov 17 '20 23:11 rdaly525

Also the generated coreir and verilog visually looks correct to me.

Nov 17 '20 23:11 rdaly525

In prog: harris_sch2_fourbuf compute unit: hcompute_lgxx_stencil_1 has a mismatch between C++ and coreir