OpenROAD-flow-scripts icon indicating copy to clipboard operation
OpenROAD-flow-scripts copied to clipboard

Add coralnpu on asap7

Open maliberty opened this issue 3 months ago • 14 comments

Description

Add https://github.com/google-coral/coralnpu to ORFS on asap7

Suggested Solution

Use fakeram as needed.

Additional Context

No response

maliberty avatar Oct 20 '25 15:10 maliberty

From Peter Gadfort:

first pass, yosys gets stuck in a techmapping pass, so will require some debugging to figure out why

I'll share something later today, but it's before the PDK is even referenced, so I don't think that would matter (it's something in the RTL from the Chisel subsystem)

test case: https://drive.google.com/file/d/1Glx6R5E9fm3ldPyH_hJKmE0mLV123_we/view?usp=sharing

jeffng-or avatar Oct 21 '25 15:10 jeffng-or

How about adding a PR for coralnpu with bazel-orfs?

oharboe avatar Oct 22 '25 05:10 oharboe

In ORFS? We have no bazel-orfs test there.

maliberty avatar Oct 22 '25 06:10 maliberty

@gadfort , which steps did you follow to generate the SV with the CoralNPUChiselSubsystem top module?

I was looking at the Integration Guide and it gives instructions on how to generate the SV for top modules with the names CoreMiniAxi or RvvCoreMiniAxi.

jeffng-or avatar Oct 22 '25 15:10 jeffng-or

The issue with yosys stems from the Chisel generated code, so once I isolated that thats the only file included. It was generated by building just the chisel part isolation, not sure if something went wrong when I ran it, but yosys still chokes on it.

I was following some of the FPGA integration since it has a top level with a reasonable number of ports (ie. integrated external memory).

gadfort avatar Oct 23 '25 12:10 gadfort

FYI, Chisel allows a lot of control over how code is generated via firtool, such as memories, verilog style, disabling verification layers, disable randomization of registers, etc. This used to be more important before slang as yosys without slang is much more finicky about what Verilog constructs that it supports.

oharboe avatar Oct 23 '25 12:10 oharboe

Note that slang is not yet on by default

maliberty avatar Oct 23 '25 16:10 maliberty

Note that slang is not yet on by default

Should it ever be?

It is easily enabled per project...

oharboe avatar Oct 23 '25 16:10 oharboe

I don't see why not as the sv support in yosys is inferior.

maliberty avatar Oct 23 '25 18:10 maliberty

I don't see why not as the sv support in yosys is inferior.

Noted. Timeframe to make slang default? "asymptotic goal"?

oharboe avatar Oct 23 '25 18:10 oharboe

More just a matter of getting around to testing it and making sure nothing breaks.

maliberty avatar Oct 23 '25 21:10 maliberty

The issue with yosys stems from the Chisel generated code, so once I isolated that thats the only file included. It was generated by building just the chisel part isolation, not sure if something went wrong when I ran it, but yosys still chokes on it.

I was following some of the FPGA integration since it has a top level with a reasonable number of ports (ie. integrated external memory).

Thanks for the clarification. I wasn't looking at the FPGA code before. I'm able to export essentially the same SV as you included in your tarball (only changes appeared to be the ifdef's and some newlines).

So, now the question is what version of the design should be added to ORFS:

  1. CoralNPUChiselSubsystem (FPGA)
  2. CoreMiniAxi (scalar-only Coral NPU configuration is provided that can integrate with an AXI-based system)
  3. RvvCoreMiniAxi (RISC-V vector (RV32IMF_Zve32x) version of Coral NPU)

jeffng-or avatar Oct 24 '25 16:10 jeffng-or

@maliberty Perhaps it is a bridge too far to try to fake RAM for this design in asap7 initially?

There's interesting stuff to be learned from this design, other than how to set up SRAMs?

Given that the SRAMs will have a mismatch in behavioral model (number of ports, read clock, etc.), wrapper/interfaced logic would have to be written to combine one or more fake RAMs to a single RAM. This is what barstools does in Chipyard and SRAM inference support in FPGA.

I find that setting number of rows in SRAMs to 1 row and then looking at what else there is to learn from a big design first is a good first step.

Realism in SRAM is dessert...

oharboe avatar Oct 26 '25 07:10 oharboe

W.r.t. interesting issues to investigate with coralnpu, I think memory instantiation and placement is the least novel/interestin, there are already designs that investigate that.

It would be neat with a feature that could put a pin in memory instantiation and placement to look for other interesting aspects with coralnpu:

https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/pull/3621

oharboe avatar Oct 27 '25 09:10 oharboe

@jeffng-or See https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/pull/3621#issuecomment-3508852354, slang is stopping memories from being inferred, which means that the ORFS flow doesn't error out early when it encounters large memories that haven't been dealt with.

oharboe avatar Nov 09 '25 21:11 oharboe

@jeffng-or FYI https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/issues/3645

oharboe avatar Nov 11 '25 09:11 oharboe

We are already using fakerams so there is no inference needed.

maliberty avatar Nov 11 '25 16:11 maliberty

We are already using fakerams so there is no inference needed.

But yosys is still stuck?

My thinking is that the memories are a red herring w.r.t. yosys "running forever".

oharboe avatar Nov 11 '25 17:11 oharboe

They save some time but not the bulk.

maliberty avatar Nov 11 '25 17:11 maliberty

They save some time but not the bulk.

When you have something you can share and it gets past synthesis in a sensible time, I am curious to understand what "glue verilog" you had to write to make the behavioral model compatible with CoralNPU and what the behavioral model is of the fakeram. actual timing in .lib file is a little bit less interesting to me, I care about how the timing paths terminate correctly in a register.

For instance, if CoralNPU uses a read clock, and fakeram doesnt have a read clock, how is that accounted for?

Specifically https://github.com/google-coral/coralnpu/blob/main/hdl%2Fchisel%2Fsrc%2Fbus%2FSpi2TLUL.scala

// val write_data_buffer = SyncReadMem(kBufferDepth, UInt(kBufferWidth.W))

oharboe avatar Nov 11 '25 17:11 oharboe

None they already have support for RAMs, just only setup for private PDKs. We just added support for asap7 fakerams.

maliberty avatar Nov 11 '25 17:11 maliberty

None they already have support for RAMs, just only setup for private PDKs. We just added support for asap7 fakerams.

Yes... how do you match up asap7 to SyncReadRam() above?

oharboe avatar Nov 11 '25 18:11 oharboe

@jeffng-or please comment on your ram changes

maliberty avatar Nov 11 '25 18:11 maliberty

Got through synthesis on CoreAxiMini. Something went wrong with constraints.sdc because I'm not getting a Endpoint Slack Histogram in synthesis, even if I side-stepped the top-level clock gating shenanigans.

https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/issues/3645#issuecomment-3518393304

oharboe avatar Nov 11 '25 19:11 oharboe

Some more luck with CoreAxiMini https://github.com/The-OpenROAD-Project/OpenROAD-flow-scripts/pull/3652#issuecomment-3520202567

oharboe avatar Nov 12 '25 06:11 oharboe

@jeffng-or please comment on your ram changes

I only instantiated and hooked up the ports that the FakeRAM had:

`elsif USE_ASAP7
///////////////////////////
//////// ASAP7 SRAM ////////
///////////////////////////
    wire [127:0] nwmask;
    genvar i;
    generate
      for (i = 0; i < 16; i++) begin
        assign nwmask[8*i +: 8] = {8{wmask[i]}};
      end
    endgenerate

    fakeram_512x128 u_asap7_sram (
      .rd_out(rdata),
      .addr_in(addr),
      .wd_in(wdata),
      .we_in(write),
      .ce_in(enable),
      .clk(clock)
    );
`else

Note that this is in CoreMiniAxi.sv, not in CoralNPUChiselSubsystem.sv (see still-open question above on which version of coralnpu to add to ORFS).

jeffng-or avatar Nov 12 '25 22:11 jeffng-or

There isnt any behavioral Verilog for FakeRAM, is there?

I tried several of the CoralNPU configurations and CoralNPU has many configuration options besides. OpenROAD seems capable of dealing with the various configurations.

To my mind what is missing are clear mission parameters that makes the choice unambigious.... An integration test to test what? A demo? PPA study?

Certainly, like MegaBoom, turnaround times will be long unless the flow is carefully curated.

oharboe avatar Nov 12 '25 23:11 oharboe

The FakeRAM is only a black box;

(* blackbox *)
module fakeram_512x128 (
   output reg [127:0] rd_out,
   input [8:0] addr_in,
   input we_in,
   input [127:0] wd_in,
   input clk,
   input ce_in
);
endmodule

jeffng-or avatar Nov 12 '25 23:11 jeffng-or

And there is a .lib file too, right?

That .lib file will reflect a behavioral model, e.g. whether there is a read clock.

oharboe avatar Nov 12 '25 23:11 oharboe

And there is a .lib file too, right?

That .lib file will reflect a behavioral model, e.g. whether there is a read clock.

Yes, there's a Liberty, but FakeRAM only has the one clock.

jeffng-or avatar Nov 12 '25 23:11 jeffng-or