lowrisc-chip Change in bus width for NastiIO

Would it be possible to change the bus width size of NastiIO fron 64 bits to 128bits? That would apply to nasti.r.bits.data and nasti.w.bits.data specifically. The purpose is being to perform read and write to DDR in 128bits instead of 64. Any advice?

Dec 11 '17 07:12 clare7

Dear Clare,

The interface to the DDR on LowRISC is 16-bits (if you are talking about the Nexys4-DDR implementation). The DDR interface IP from Xilinx

certainly supports a wider word but I should imagine this is more appropriate to high-end boards that have a PC style DDR SIM module, rather

than an individual mobile DDR. If you want higher throughput you could experiment with degrees of freedom in the clock converter ratio and the

DLLs that control the DDR, always bearing in mind that the FPU is a limiting factor in the clock rate to the Rocket subsystem.

I'm ashamed to say I don't know how to alter the Chisel/Scala part of the design. Maybe someone else can jump in here.

A comprehensive improvement would separate out the FPU pipeline from the L2-cache line-fill/write-back function which would appear to be

the bottleneck in this case. Undoubtedly the design could benefit from a performance improvement, but widening the DDR interface is only part

of the solution.

On 11/12/17 07:54, clare7 wrote:

Would it be possible to change the bus width size of NastiIO fron 64 bits to 128bits? That would apply to nasti.r.bits.data and nasti.w.bits.data specifically. The purpose is being to perform read and write to DDR in 128bits instead of 64. Any advice?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lowRISC/lowrisc-chip/issues/81, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgF13aC3H_cSy-m21p8n--w-dTtw1INks5s_N-wgaJpZM4Q8_VX.

Dec 11 '17 08:12 jrrk

In theory, yes, you can change the data width. In reality, it is complicated.

Would you please clarify your intention?

You want to have a 128-bit bus between LLC (L2) and the DDR controller? (Does your DDR controller support 128-bit bus?)
Which version of lowRISC are you talking about here? (v0.1 or v0.2 is a No, may be for v0.3 and v0.4)

Dec 11 '17 16:12 wsong83

Yes I want to have a 128-bit bus between LL2 and to DDR controller and the DDR controller has the support for 128-bit bus.

Right now I am using v0.4

Dec 12 '17 00:12 clare7

OK, so here is what you might need to do if you really want a 128-bit LLC/DDR interconnect.

Since we are not planned to provide this configuration, you need to change the parameters directly. In src/main/scala/Configs.scala, you need to change dataBeats from 8 to 4:

      case TLKey("L2toMem") =>
        TileLinkParameters(
          coherencePolicy = new MICoherence(new NullRepresentation(site(NBanks))),
          nManagers = 1,
          nCachingClients = site(NBanks),
          nCachelessClients = 0,
          maxClientXacts = site(NAcquireTransactors) + 2,
          maxClientsPerPort = site(NAcquireTransactors) + 2,
          maxManagerXacts = 1,
          dataBits = site(CacheBlockBytes)*8,
          dataBeats = 8
        )

You might get a better chance if you disable tagged memory for this. Make sure the LLC and the tagged memory (if enabled) do generate the right circuit with this change of parameter. An easy way to know, is to compile it, run some bare-metal tests. Fix bugs if there are any.
Hopefully the generated ROCKET_MEM_DAT_WIDTH is set to 128 automatically. If not, something is wrong with the Chisel code, you need to fix that as well.
On the Verilog side, I think there are a couple of component assume the memory bus to be 64-bit by default (not really follow the aforementioned macro). You need to fix them as well (pay special attention to the AXI crossbars on the memory path).

Good luck.

Dec 12 '17 03:12 wsong83

I have made necessary changes and the compilation seems alright. But I faced the followed error during testing:

/lowrisc-chip/src/test/cxx/common/dpi_ram_behav.cpp:180: bool Memory32::write(uint32_t, const uint32_t&, const uint32_t&): Assertion (addr & 0x3) == 0' failed. ` what are the changes I need to make on the test file in order to accommodate the data width changes?

Dec 12 '17 09:12 clare7

Did this error jump out at the very beginning of the simulation? It might be a reset error (the behavioural bram model received a random request before the circuit is actually reset).

Can you specify which simulation were you running with this error? Preferably with the full command line history and log file (more information will help me to specify the error).

Dec 12 '17 15:12 wsong83

Yes the error jumpout at the beginning of the simulation. This happen only after I changed the bandwidth

This is what I run on the command line: TestMem-sim-debug +vcd +vcd_name=test.vcd +load=./mem_ddr.riscv

and then I get the printout from the printf:

memory32 write [4000a720] = 00000000
memory32 write [4000a724] = 00000000
memory32 write [4000a728] = 00000000
memory32 write [4000a72c] = 00000000
memory32 write [4000a730] = 00000000
memory32 write [4000a734] = 00000000
memory32 write [4000a738] = 00000000
memory32 write [4000a73c] = 00000000
memory32 write [4000a740] = 00000000
Glip TCP DPI listening on port 23000 and 23001
Client connected
Write Mem: [Enc]_: 00000093305290736d82b28300007297, 
TestMem-sim-debug : /home/mmwong/Documents/lowrisc-chip/src/test/cxx/common/dpi_ram_behav.cpp:180: bool Memory32::write(uint32_t, const uint32_t&, const uint32_t&): Assertion `(addr & 0x3) == 0' failed.
Aborted (core dumped)

I put in some printf in the dpi_ram_behv.cpp of which I get to see the series of memory32write as shown above.

The Write Mem: [Enc]_: 00000093305290736d82b28300007297 was the printf of the nasti.io.r.bits.data from the /uncore/src/main/converter.scala file.

Dec 12 '17 16:12 clare7

So this does not look like a reset issue any more. Clearly the error occurs after the reset is withdrawn.

If this error occurs only when the bus width is doubled to 128-bit, I am afraid there is something wrong with the design that must be found and fixed. I do not think I can tell you what is wrong as I have not done this before. You probably need to try debug yourself.

Something for your consideration:

If you changes only the bus width of the DDR bus, why it would affect the behavioural model for BRAM (0x40000000). I suppose the DDR is located to 0x80000000? Unless you have mapped the DDR to 0x40000000 and used the behavioural model for DDR space. I think this will probably need some changes in the behavioural model (I mean all the code related to the dpi_ram_behav)
Why would some printf on nasti.io.r.bits.data be reported as write? r is the response channel for a read, isn't it?
You probably want to add more printf statements in the dpi_ram_behav to see the address that caused this error. Basically the assertion is to guarantee that the memory access is aligned.

Dec 13 '17 09:12 wsong83

Oh, to simulate your design, probably you need to make the behavioural ram model working with the 128-bit setting anyway.

Dec 13 '17 10:12 wsong83

After adding moreprintfI get the following, I observed that the address involved was addr[4000000f] and that is why the offset is nonzero hence violated the assertion checking. And from the dev.h file, I get the address for memory as: #define DEV_MAP__mem__BASE 0x40000000llu

Now I suspect the error is resulted from the elfLoader in from dpi_ram_beha.cpp where the full function is written in loadelf.cpp

// load initial memory
void MemoryController::load_mem(const string& filename) {
  using namespace std::placeholders;
  std::function<void(uint32_t, uint32_t, const uint8_t*)> f =
    std::bind(&Memory32::write_block, &mem, _1, _2, _3);
  elfLoader loader = elfLoader(f);
  loader(filename);
}

and how do i make the behavioural ram model working with 128-bit setting?? any guideline to begin with?

Dec 13 '17 15:12 clare7

It just so happens that the block ram in the FPGA build is 128 bits.

So it should be very easy to interface this to your new design. It will no longer be

behavioural, but will probably not be the performance limiting factor.

You can launch this simulation in the board specific directory (fpga/board/nexys4ddr).

Just remove the width conversion and alter the block ram generator in Vivado for your new width.

You can also disable peripherals such as DDR/Minions/VGA/PS2 which are not relevant to your purpose.

You can use iSim (part of Vivado) or VCSMX if you have it from your institute or company.

Needless to say the DDR controller, if included in your simulation, needs to have its width settings modified, with the eventual aim of measuring the benefit. This is for the reason that the PHY runs at 200MHz, but there is a 4:1 speed conversion between that and the controller, and a 2:1 conversion between the controller and the L2-cache. Each of those conversions will be a bottleneck for performance on the slow side of the converter.

If you really want to boost performance, you should figure out how to make the L2-cache run at 50MHz, so that it matches the speed of the DDR controller. This would give an equivalent or better gain, but without the implications for real FPGA occupancy that a 128-bit bus implies. This perhaps should wait until experience has been gained of more straightforward tasks.

On 12/13/2017 03:38 PM, clare7 wrote:

After adding more|printf|I get the following, I observed that the address involved was |addr[4000000f]| and that is why the offset is nonzero hence violated the assertion checking. And from the |dev.h file|, I get the address for memory as: |#define DEV_MAP__mem__BASE 0x40000000llu|

Now I suspect the error is resulted from the |elfLoader| in from |dpi_ram_beha.cpp|where the full function is written in|loadelf.cpp|

|// load initial memory void MemoryController::load_mem(const string& filename) { using namespace std::placeholders; std::function<void(uint32_t, uint32_t, const uint8_t*)> f = std::bind(&Memory32::write_block, &mem, _1, _2, _3); elfLoader loader = elfLoader(f); loader(filename); } |

and how do i make the behavioural ram model working with 128-bit setting?? any guideline to begin with?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lowRISC/lowrisc-chip/issues/81#issuecomment-351428355, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgF1zzPh_YWirys6lR-1KJxkB4OCJkaks5s_-99gaJpZM4Q8_VX.

Dec 13 '17 16:12 jrrk

The address 0x4000000f actually means the address is indeed unaligned, which means the assertion is indeed catching an error. So the question then is why the behavioural ram receives such a memory access?

You probably need to trace it back to see who exactly initiate such unaligned request. Generating a VCD waveform may help you finding it.

Dec 14 '17 15:12 wsong83

Nothing helpful from the VCD so far.At the moment I could only be sure error occurred right at the point when the first line of the executable file been written to the memory. Seems like the offset somehow is set to F as such the first read is jumped to 0x4000000f instead to be aligned accordingly.

By the way, could it be anything to do with the following line from Config.scala file? val memAlign = BigInt(1L << 30)

what does that line do to the memory?

Dec 18 '17 08:12 clare7