SpinalHDL icon indicating copy to clipboard operation
SpinalHDL copied to clipboard

How to generate only one block-RAM with byte-writing-mask

Open balanx opened this issue 6 months ago • 7 comments

Creating a memory with byte-writing-mask will synthesize more than one block-ram. It seems such a waste for FPGA resouces.

The below will occupy 4 block-ram resouces.

case class Test() extends Component {

  val E  = in Bool()
  val W  = in Bits(4 bits)
  val A  = in UInt (10 bits)
  val D  = in UInt(32 bits)
  val Q  = out UInt(32 bits)

  val mem = Mem(UInt(32 bits), 1024)

  val wen = (W =/= 0 & E)
  val ren = (W === 0 & E)

  mem.write(
    enable  = wen,
    mask  = W,
    address = A,
    data    = D
  )

  Q := mem.readSync(
    enable  = ren,
    address = A
  )

}

The below will synthesize only 1 block-ram. But how can I do this by spinal hdl except blackbox way ? Verilog Example

balanx avatar Oct 23 '25 13:10 balanx

Now Mem writing operation is,

mem(address) := data

Will Spinal support writing operation as below ?

mem(address) (offset , width)  := data

balanx avatar Oct 25 '25 02:10 balanx

Hi,

But how can I do this by spinal hdl except blackbox way ?

hmmm you could enable blackboxing, but then, provide inline verilog implementation. For instance in you spinalconfig :

      mySpinalConfig.memBlackBoxers += new PhaseNetlist {
        override def impl(pc: PhaseContext) = {
          pc.walkComponents{
            case bb: Ram_1w_1rs => bb.setInlineVerilog(Ram_1w_1rs.efinix)
            case _ =>
          }
        }
      }

Dolu1990 avatar Oct 27 '25 12:10 Dolu1990

Thanks @Dolu1990 , Ummmmmm…… I tried Chisel and it has the same result.

balanx avatar Oct 27 '25 13:10 balanx

I tried Chisel and it has the same result.

Ram inference is a real paine in the ass in general ^^ Byte mask, mixed width, read during write ordering, all paine XD

Dolu1990 avatar Oct 30 '25 09:10 Dolu1990

https://github.com/chipsalliance/chisel/issues/1289 https://github.com/llvm/circt/issues/4274 https://github.com/llvm/circt/pull/4275 Maybe ... ... , Chisel has solved this problem .

balanx avatar Oct 31 '25 11:10 balanx

Did they realy solved the problem ? or they picked one way of emitting the verilog, which will only work with a subset of synthetisers ?

Dolu1990 avatar Oct 31 '25 12:10 Dolu1990

This might be a compromise, and also an equivalent Verilog implementation. For Xilinx FPGAs, if byte masking is required, the BRAM18K must be synthesized to an 8-bit * 2048-depth, thus requiring four blocks. If you use the AXI BRAM Controller IP core, you'll also find that the minimum depth is 2048.

  val mem = Mem(UInt(32 bits), 1024)

  val wen = (W =/= 0 & E)
  val ren = (W === 0 & E)
  val M = Vec(
    for (i <- 0 until 4) yield {
      W(i) ? U(0xFF, 8 bits) | U(0x00, 8 bits)
    }
  ).asBits.asUInt

  mem.write(
    enable  = wen,
    address = A,
    data    = D & M
  )

buloruabutata avatar Dec 08 '25 09:12 buloruabutata