hls4ml Add support for parsing simple brevitas layers as part of pytorch models

This PR adds support for the parsing of simple brevitas layers (QuantLinear, QuantActivation, QuantConv1D, QuantConv2D) to the pytorch parser. More complex models will still have to go through QONNX, but simple cases can be handled easily within the pytorch parser itself. To this end, this PR adds a new quantizer which only propagates the desired precision to the hsl4ml model, as brevitas already provides the quantized tensors which we pick up directly.

Type of change

For a new feature or function, please create an issue first to discuss it with us before submitting a pull request.

Note: Please delete options that are not relevant.

[x] New feature (non-breaking change which adds functionality)

Tests

Tested locally with simple model and added pytests.

Checklist

[x] I have read the guidelines for contributing.
[x] I have commented my code, particularly in hard-to-understand areas.
[ ] I have made corresponding changes to the documentation.
[x] My changes generate no new warnings.
[x] I have installed and run pre-commit on the files I edited or added.
[x] I have added tests that prove my fix is effective or that my feature works.

Jun 07 '24 18:06 JanFSchulte

pre-commit.ci autofix

Jun 07 '24 18:06 JanFSchulte

Do we need to add brevitas to our test environment docker image? Or can we just add brevitas as a testing dependency? I think that's why the test failed.

Jun 26 '24 19:06 jmitrevs

pre-commit.ci autofix

Jul 22 '24 20:07 JanFSchulte

pre-commit.ci autofix

Jan 10 '25 17:01 JanFSchulte

pre-commit.ci autofix

Jan 10 '25 18:01 JanFSchulte

Turns out in the latest version of brevitas there are significant changes to the interface on how to access information about the quantized layers and tensors. I'm not sure right now when I will have time to rework this PR, so I'm converting it to draft for the moment.

Jan 13 '25 13:01 JanFSchulte

Hi, Thanks for this initial works on Brevitas conversion, and I think there is an issue. I've tested the QuantLinear layer with 8-bit weight and 8-bit input_quant and expect (8-bit, 8-bit) multiplication in the linear layer, however, the converted model still uses the default quantizer for the input_quant. The following code is modified from the provided pytest unit case.

import brevitas.nn as qnn
import torch
import hls4ml
from brevitas.quant import Int8WeightPerTensorFixedPoint, Int8ActPerTensorFixedPoint
from torch import nn
from torch.nn import Module

from hls4ml.converters import convert_from_pytorch_model
from hls4ml.utils.config import config_from_pytorch_model

class QuantModelLinear(Module):
    def __init__(self):
        super().__init__()
        self.conv1 = qnn.QuantLinear(16, 16, bias=True, 
                                     weight_quant=Int8WeightPerTensorFixedPoint, input_quant=Int8ActPerTensorFixedPoint)
        self.relu1 = qnn.QuantReLU()

    def forward(self, x):
        out = self.relu1(self.conv1(x))
        return out

def test_quantlinear(backend, io_type):
    model = QuantModelLinear()

    x = torch.rand([1,16])

    pytorch_prediction = model(x).detach().numpy()
    config = config_from_pytorch_model(model, input_shape=(None, 16))
    output_dir = str(f'hls4mlprj_brevitas_linear_{backend}_{io_type}')

    hls_model = convert_from_pytorch_model(
        model,
        hls_config=config,
        output_dir=output_dir,
        backend=backend,
        io_type=io_type,
        part='xcu50-fsvh2104-2-e'
    )
    hls_model.compile()

    # hls_prediction = np.reshape(hls_model.predict(x.detach().numpy()), pytorch_prediction.shape)

    # np.testing.assert_allclose(hls_prediction, pytorch_prediction, rtol=0.0, atol=0.05)
    return hls_model

hls_model = test_quantlinear("Vitis", "io_stream")
hls4ml.utils.plot_model(hls_model, show_shapes=True, show_precision=True, to_file=None)

The following is the computational graph of converted model, and we can see that the multiplication of linear layer would be 8-bit(weight) multiply 16-bit(activation), which is different from the Brevitas setting. 螢幕擷取畫面 2025-02-02 104325

Feb 02 '25 02:02 JiaMingLin

Hi @JiaMingLin Thanks for testing out this feature. I have been looking at it and I am beginning to realize that I have overlooked some complications with parsing from brevitas to hls4ml. I will convert this to a draft and try to find more comprehensive solutions.

Feb 06 '25 17:02 JanFSchulte

We now support the quantization of input and output tensors of QuantLinear, QuantConv1D, and QuantConv2D and QuantActivation layers. Only quantization with scales equaling powers of 2 are supported at the moment.

Feb 21 '25 15:02 JanFSchulte

Hey @jmitrevs I can't reproduce the oneAPI compilation error in the test_brevitas_parsing pytest locally for some reason. I'm suspecting it's genuine, but it's hard to tell because locally it compiles fine. I have oneAPI version 2025.0 installed. Could you see if you can reproduce this?

Mar 26 '25 20:03 JanFSchulte

To summarize the current state of this: I think this PR contains what we can reasonably support currently and most things work somewhat well. Everything is limited to power-of-2 scales,but I think that's fine for now. The big issue that I still have is QuantLSTM. Brevitas completely reimplements RNNs and re-quantizes frequently internally. I tried to match that by setting the precision of the internally used variables more granular than we currently do, but the precision is horrible. I am tempted to remove it for the moment. Currently, you can't even export QuantLSTM to QONNX, so this is something that brevitas itself is still figuring out.

Mar 26 '25 20:03 JanFSchulte

Removed QuantLSTM for now. QuantRNN is also not great, but at least usually gets results in the same ballpark. Otherwise I think this is ready for review.

Apr 09 '25 17:04 JanFSchulte

pre-commit.ci autofix

Jun 05 '25 12:06 JanFSchulte

pre-commit.ci autofix

Nov 05 '25 17:11 JanFSchulte