Description

Second part of hls4ml Optimization API #768

Introduces Dense Unrolled layers, optimising multiplications with zero in Resource strategy with RF > 1

Introduces additional TCL scripts, to optimise zero BRAM blocks.

Type of change

[x] New feature (non-breaking change which adds functionality)
[x] A new research paper code implementation
[x] Fix issue #798

Tests

Added a new test, test_dense_unrolled that verifies dense resource layers implement avoiding zero multiplications are correct

Comparison with "standard" Dense Resource will be shortly available in the (updated) PR #768.

Checklist

[x] I have read the guidelines for contributing.
[x] I have commented my code, particularly in hard-to-understand areas.
[x] I have made corresponding changes to the documentation.
[x] My changes generate no new warnings.
[x] I have installed and run pre-commit on the files I edited or added.
[x] I have added tests that prove my fix is effective or that my feature works.

Jun 13 '23 20:06 bo3z

I will add pre-commit additionally, last time I ran it, some tests were broken, so will add it a subsequent commit.

Jun 13 '23 20:06 bo3z

This is ready for review, seems that pre-commit can re-arrange the order of includes in C++ header files and it could cause compilation error.

Jun 16 '23 10:06 bo3z

We merged part 1. Should we merge part 2?

Feb 07 '24 18:02 jmitrevs

I'm reviewing it. Slowly :smiley: . But it's next in line, then HGQ.

Feb 07 '24 18:02 vloncar

The pytest error is unrelated to the PR so from my side this can be merged. I'll let Vladimir give the last OK.

May 03 '24 23:05 jmitrevs

This PR was refactored to introduce the new unrolled implementation as a "strategy", to be an alternative to existing latency and resource strategies. This allowed the the matrix-vector multiplication kernel to be used as a function, simplifying the integration with the rest of the code. The PR also has changes to the top pipeline style pragma, so the config now includes a new "auto" option (the default) which allows the optimizer to choose the best one. All of pipeline style decisions are now made in the new optimizer, instead of being scattered around the HLSConfig class and the backend.

One more minor change may come. Since we will have multiple new strategies and optimization options, it was suggested to give this optimization technique a name and move it to a submodule of that name. Discussion on this is welcome.

Aug 25 '24 23:08 vloncar

hls4ml Optimization API [Part 2]

Description

Type of change

Tests

Checklist