hls4ml icon indicating copy to clipboard operation
hls4ml copied to clipboard

hls4ml Optimization API [Part 2]

Open bo3z opened this issue 2 years ago • 5 comments

Description

  • Second part of hls4ml Optimization API #768
  • Introduces Dense Unrolled layers, optimising multiplications with zero in Resource strategy with RF > 1
  • Introduces additional TCL scripts, to optimise zero BRAM blocks.

Type of change

  • [x] New feature (non-breaking change which adds functionality)
  • [x] A new research paper code implementation
  • [x] Fix issue #798

Tests

  • Added a new test, test_dense_unrolled that verifies dense resource layers implement avoiding zero multiplications are correct
  • Comparison with "standard" Dense Resource will be shortly available in the (updated) PR #768.

Checklist

  • [x] I have read the guidelines for contributing.
  • [x] I have commented my code, particularly in hard-to-understand areas.
  • [x] I have made corresponding changes to the documentation.
  • [x] My changes generate no new warnings.
  • [x] I have installed and run pre-commit on the files I edited or added.
  • [x] I have added tests that prove my fix is effective or that my feature works.

bo3z avatar Jun 13 '23 20:06 bo3z

I will add pre-commit additionally, last time I ran it, some tests were broken, so will add it a subsequent commit.

bo3z avatar Jun 13 '23 20:06 bo3z

This is ready for review, seems that pre-commit can re-arrange the order of includes in C++ header files and it could cause compilation error.

bo3z avatar Jun 16 '23 10:06 bo3z

We merged part 1. Should we merge part 2?

jmitrevs avatar Feb 07 '24 18:02 jmitrevs

I'm reviewing it. Slowly :smiley: . But it's next in line, then HGQ.

vloncar avatar Feb 07 '24 18:02 vloncar

The pytest error is unrelated to the PR so from my side this can be merged. I'll let Vladimir give the last OK.

jmitrevs avatar May 03 '24 23:05 jmitrevs

This PR was refactored to introduce the new unrolled implementation as a "strategy", to be an alternative to existing latency and resource strategies. This allowed the the matrix-vector multiplication kernel to be used as a function, simplifying the integration with the rest of the code. The PR also has changes to the top pipeline style pragma, so the config now includes a new "auto" option (the default) which allows the optimizer to choose the best one. All of pipeline style decisions are now made in the new optimizer, instead of being scattered around the HLSConfig class and the backend.

One more minor change may come. Since we will have multiple new strategies and optimization options, it was suggested to give this optimization technique a name and move it to a submodule of that name. Discussion on this is welcome.

vloncar avatar Aug 25 '24 23:08 vloncar