hls4ml Optimization API [Part 2]
Description
- Second part of hls4ml Optimization API #768
- Introduces Dense Unrolled layers, optimising multiplications with zero in Resource strategy with RF > 1
- Introduces additional TCL scripts, to optimise zero BRAM blocks.
Type of change
- [x] New feature (non-breaking change which adds functionality)
- [x] A new research paper code implementation
- [x] Fix issue #798
Tests
- Added a new test,
test_dense_unrolledthat verifies dense resource layers implement avoiding zero multiplications are correct- Comparison with "standard" Dense Resource will be shortly available in the (updated) PR #768.
Checklist
- [x] I have read the guidelines for contributing.
- [x] I have commented my code, particularly in hard-to-understand areas.
- [x] I have made corresponding changes to the documentation.
- [x] My changes generate no new warnings.
- [x] I have installed and run
pre-commiton the files I edited or added. - [x] I have added tests that prove my fix is effective or that my feature works.
I will add pre-commit additionally, last time I ran it, some tests were broken, so will add it a subsequent commit.
This is ready for review, seems that pre-commit can re-arrange the order of includes in C++ header files and it could cause compilation error.
We merged part 1. Should we merge part 2?
I'm reviewing it. Slowly :smiley: . But it's next in line, then HGQ.
The pytest error is unrelated to the PR so from my side this can be merged. I'll let Vladimir give the last OK.
This PR was refactored to introduce the new unrolled implementation as a "strategy", to be an alternative to existing latency and resource strategies. This allowed the the matrix-vector multiplication kernel to be used as a function, simplifying the integration with the rest of the code. The PR also has changes to the top pipeline style pragma, so the config now includes a new "auto" option (the default) which allows the optimizer to choose the best one. All of pipeline style decisions are now made in the new optimizer, instead of being scattered around the HLSConfig class and the backend.
One more minor change may come. Since we will have multiple new strategies and optimization options, it was suggested to give this optimization technique a name and move it to a submodule of that name. Discussion on this is welcome.