hls4ml Weight Pruning on FPGA

I have applied sparsity API weight pruning and synthesized both pruned and unpruned models using hls4ml targeting a Xilinx FPGA. The pruned model achieved an 11% reduction in DSP48 usage, indicating reduced multiply-accumulate (MAC) operations.

However, the overall reduction in flip-flops (FFs) and look-up tables (LUTs) was minimal (<2%), and BRAM usage slightly increased.

May 10 '25 17:05 Smitashree-code

It's hard to say why the reductions are so small without knowing specific parameters, such as model type, size, reuse factor etc. Could you please share some details on this? Also, can you confirm that you used the DenseUnrolled Strategy when synthesising the hardware.

In general, the higher the reuse factor, the lower the savings in DSP. The reason behind this is explained in the paper: https://arxiv.org/abs/2308.05170. For BRAM savings, these are generally only visible after the final place-and-route, but may not be as high, since the unrolled dense implementation can sometimes have issues with closing timing, so Vivado may replicate certain instances to improve routing and timing closure.

May 11 '25 12:05 bo3z

Thank you for your response.I have mailed you regarding my model and yml file.Please check it once.

May 13 '25 04:05 Smitashree-code

I have tried a lot,but resource utilization is almost same for pruned and unpruned model with sparsity API.After training,my pruned model is almost half of the size of the unpruned model.

May 14 '25 05:05 Smitashree-code