OpenLane icon indicating copy to clipboard operation
OpenLane copied to clipboard

Unbalanced buffer insertion on high fanout designs

Open Dolu1990 opened this issue 2 years ago • 3 comments

Description

Hi,

My setup: I had a design with a high fanout part, where i had to read a few register based memory array (~2530 muxes to drive from one address).

Symptoms : In such case, it seems that the buffer insertion done by the flow is very unbalanced, because that critical path was using a chain of 13 buffers (typical can out of 10), while an utopian balanced fanout would be able to reach 10^13 gates.

Here is an image to ilustrate, where in pink you can see the buffer chain for the high fanout net : image

Here is for reference the critical path :

  Delay    Time   Description
---------------------------------------------------------
   0.00    0.00   clock clk (rise edge)
   0.00    0.00   clock network delay (ideal)
   0.00    0.00 ^ EU0_ExecutionUnitBase_pipeline_execute_0_Frontend_MICRO_OP[8]_sky130_fd_sc_hd__mux2_2_A0_A1_sky130_fd_sc_hd__a221o_2_A1_X_sky130_fd_sc_hd__o21a_2_B1_X_sky130_fd_sc_hd__dfxtp_2_D/CLK (sky130_fd_sc_hd__dfxtp_4)
   0.46    0.46 v EU0_ExecutionUnitBase_pipeline_execute_0_Frontend_MICRO_OP[8]_sky130_fd_sc_hd__mux2_2_A0_A1_sky130_fd_sc_hd__a221o_2_A1_X_sky130_fd_sc_hd__o21a_2_B1_X_sky130_fd_sc_hd__dfxtp_2_D/Q (sky130_fd_sc_hd__dfxtp_4)
   0.28    0.75 v EU0_ExecutionUnitBase_pipeline_execute_0_SrcStageables_SRC2[1]_sky130_fd_sc_hd__or2_2_B/X (sky130_fd_sc_hd__or2_1)
   0.24    0.98 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[1]_sky130_fd_sc_hd__and4_2_A/X (sky130_fd_sc_hd__and4_1)
   0.21    1.19 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[1]_sky130_fd_sc_hd__a31oi_2_B1_Y_sky130_fd_sc_hd__o21bai_2_A1/Y (sky130_fd_sc_hd__o21bai_4)
   0.26    1.45 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[2]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_1)
   0.23    1.69 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[3]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_1)
   0.25    1.94 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[4]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_1)
   0.29    2.23 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[5]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_2)
   0.25    2.48 v EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[6]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21o_2_B1/X (sky130_fd_sc_hd__a21o_1)
   0.29    2.77 ^ EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[7]_sky130_fd_sc_hd__and4_2_A_X_sky130_fd_sc_hd__a21oi_2_B1/Y (sky130_fd_sc_hd__a21oi_4)
   0.22    2.99 ^ EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[8]_sky130_fd_sc_hd__nand2_2_A_Y_sky130_fd_sc_hd__o211a_2_C1/X (sky130_fd_sc_hd__o211a_1)
   0.24    3.23 ^ EU0_ExecutionUnitBase_pipeline_execute_0_MUL_SRC1[10]_sky130_fd_sc_hd__nand2_2_A_Y_sky130_fd_sc_hd__o31a_2_B1/X (sky130_fd_sc_hd__o31a_2)
   0.20    3.43 ^ Lsu2Plugin_logic_sq_mem_addressPre[0][13]_sky130_fd_sc_hd__mux4_2_A0_X_sky130_fd_sc_hd__mux2_2_A0_X_sky130_fd_sc_hd__mux2_2_A1_X_sky130_fd_sc_hd__nand2_2_B_Y_sky130_fd_sc_hd__o31a_2_B1_A2_sky130_fd_sc_hd__and3_2_X_B_sky130_fd_sc_hd__a211o_2_X/X (sky130_fd_sc_hd__a211o_1)
   0.15    3.59 v Lsu2Plugin_logic_sq_mem_addressPre[0][13]_sky130_fd_sc_hd__mux4_2_A0_X_sky130_fd_sc_hd__mux2_2_A0_X_sky130_fd_sc_hd__mux2_2_A1_X_sky130_fd_sc_hd__nand2_2_B_Y_sky130_fd_sc_hd__o31a_2_B1_A3_sky130_fd_sc_hd__a21oi_2_Y/Y (sky130_fd_sc_hd__a21oi_2)
   0.29    3.88 v wire486/X (sky130_fd_sc_hd__buf_4)
   0.43    4.31 v Lsu2Plugin_logic_sq_mem_addressPre[0][13]_sky130_fd_sc_hd__mux4_2_A0_X_sky130_fd_sc_hd__mux2_2_A0_X_sky130_fd_sc_hd__mux2_2_A1_X_sky130_fd_sc_hd__nand2_2_B_Y_sky130_fd_sc_hd__o31a_2_B1/X (sky130_fd_sc_hd__o31a_2)
   0.20    4.51 v wire479/X (sky130_fd_sc_hd__buf_12)
   0.44    4.95 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[8][4]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__inv_2_Y/Y (sky130_fd_sc_hd__inv_2)
######### chain start here
   0.30    5.24 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[8][4]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_1_A_1/X (sky130_fd_sc_hd__buf_4)
   0.32    5.57 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[0][5]_sky130_fd_sc_hd__mux4_2_A0_S1_sky130_fd_sc_hd__buf_1_A/X (sky130_fd_sc_hd__buf_4)
   0.31    5.87 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[4][7]_sky130_fd_sc_hd__mux4_2_A0_S1_sky130_fd_sc_hd__buf_1_A_1/X (sky130_fd_sc_hd__clkbuf_8)
   0.32    6.19 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[0][9]_sky130_fd_sc_hd__mux4_2_A0_S1_sky130_fd_sc_hd__buf_1_A_1/X (sky130_fd_sc_hd__buf_6)
   0.36    6.55 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_2[4][9]_sky130_fd_sc_hd__mux4_2_A0_S1_sky130_fd_sc_hd__buf_1_A_1/X (sky130_fd_sc_hd__buf_8)
   0.37    6.92 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][9]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_1_X/X (sky130_fd_sc_hd__clkbuf_8)
   0.31    7.23 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][9]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_2_A/X (sky130_fd_sc_hd__clkbuf_8)
   0.35    7.58 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][16]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_1_X/X (sky130_fd_sc_hd__buf_6)
   0.33    7.91 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][16]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_1_A/X (sky130_fd_sc_hd__buf_4)
   0.32    8.23 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[4][17]_sky130_fd_sc_hd__mux2_2_A0_S_sky130_fd_sc_hd__buf_2_A/X (sky130_fd_sc_hd__buf_12)
   0.45    8.68 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_3[29][18]_sky130_fd_sc_hd__or2_2_A_B_sky130_fd_sc_hd__buf_1_X/X (sky130_fd_sc_hd__clkbuf_16)
   0.39    9.07 ^ Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_3[29][18]_sky130_fd_sc_hd__or2_2_A_B_sky130_fd_sc_hd__buf_1_A_4/X (sky130_fd_sc_hd__buf_12)
   0.44    9.51 ^ Lsu2Plugin_logic_sharedPip_stages_0_ADDRESS_PRE_TRANSLATION[13]_sky130_fd_sc_hd__buf_2_X/X (sky130_fd_sc_hd__buf_12)
######### chain end here
   0.66   10.17 v Lsu2Plugin_setup_translationStorage_logic_sl_0_ways_0[0][19]_sky130_fd_sc_hd__mux4_2_A0/X (sky130_fd_sc_hd__mux4_2)
   0.42   10.59 v Lsu2Plugin_logic_sharedPip_stages_0_MMU_L0_ENTRIES_0_physicalAddress[1]_sky130_fd_sc_hd__a31o_2_X_A2_sky130_fd_sc_hd__a211o_2_X/X (sky130_fd_sc_hd__a211o_1)
   0.21   10.80 v Lsu2Plugin_logic_sharedPip_stages_0_MMU_L0_ENTRIES_0_physicalAddress[1]_sky130_fd_sc_hd__a31o_2_X/X (sky130_fd_sc_hd__a31o_1)
   0.00   10.80 v Lsu2Plugin_logic_sharedPip_stages_0_MMU_L0_ENTRIES_0_physicalAddress[1]_sky130_fd_sc_hd__dfxtp_2_D/D (sky130_fd_sc_hd__dfxtp_1)
          10.80   data arrival time

So, i don't know if you had similar issues / case ? ex : clock enable / reset tree / ...

Proposal

No response

Dolu1990 avatar Feb 06 '24 08:02 Dolu1990

I got another case, with a very long chain of buffer on another part of the design (16 buffer), but this time the average fanout is quite different. Bellow, in pink is the buffer chain path. image

I would say there is no good reason for so much buffers, but i don't know much. Let's me know if you have an idea :)

Dolu1990 avatar Feb 07 '24 13:02 Dolu1990

Hi, In the second example, most of the buffers are either for long wires or max cap. Typically, the fanout buffers are names start with fanout. Also, in the first example, all the buffer names are different. Are you sure these buffers are inserted by OpenROAD in the Design Optmizations? Also, can you send a reproducible or the configuration and timing constraints used? Thanks

mo-hosni avatar Feb 22 '24 11:02 mo-hosni

Hi,

In the second example, most of the buffers are either for long wires or max cap. Typically, the fanout buffers are names start with fanout

I lost the exact setup i used in the screen shot, sorry, but i still had the outflow. I toke a video where i show a bit the paths. https://drive.google.com/file/d/1WWhCPqjMZksxn_hHWuh5DWztjdk9ewGj/view?usp=drive_link Seems to me that the first part of the buffer chain is achieving very little (not traveling far nor driving much, especialy compared to other paths in the design)

Also, in the first example, all the buffer names are different. Are you sure these buffers are inserted by OpenROAD in the Design Optmizations?

In the verilog i feed openlane with, there is no handwritten buffer insertions, so those buffer come from somewere in the whole openlane flow, i don't know more.

Also, can you send a reproducible or the configuration and timing constraints used?

Here is a design which can be used to recreate similar issues to the first case : design_nax.zip

And a video of the case 1: case1.mkv.zip Also, one thing to notice in that case 1, i digged a bit more to see what was connected to the buffers, and basicaly in that loooong chain of buffers, for each layer it is generaly : "~8 gates + ~2 buffers". So each layer scale the path very little.

For case 2, I do not have the original verilog file, but here is the synthethised netlist (in case of, it may just be fine to swap it in the design_nax.zip) nax.v.zip

Thanks :)

Dolu1990 avatar Feb 25 '24 00:02 Dolu1990