Add support for parsing simple brevitas layers as part of pytorch models
This PR adds support for the parsing of simple brevitas layers (QuantLinear, QuantActivation, QuantConv1D, QuantConv2D) to the pytorch parser. More complex models will still have to go through QONNX, but simple cases can be handled easily within the pytorch parser itself. To this end, this PR adds a new quantizer which only propagates the desired precision to the hsl4ml model, as brevitas already provides the quantized tensors which we pick up directly.
Type of change
For a new feature or function, please create an issue first to discuss it with us before submitting a pull request.
Note: Please delete options that are not relevant.
- [x] New feature (non-breaking change which adds functionality)
Tests
Tested locally with simple model and added pytests.
Checklist
- [x] I have read the guidelines for contributing.
- [x] I have commented my code, particularly in hard-to-understand areas.
- [ ] I have made corresponding changes to the documentation.
- [x] My changes generate no new warnings.
- [x] I have installed and run
pre-commiton the files I edited or added. - [x] I have added tests that prove my fix is effective or that my feature works.
pre-commit.ci autofix
Do we need to add brevitas to our test environment docker image? Or can we just add brevitas as a testing dependency? I think that's why the test failed.
pre-commit.ci autofix
pre-commit.ci autofix
pre-commit.ci autofix
Turns out in the latest version of brevitas there are significant changes to the interface on how to access information about the quantized layers and tensors. I'm not sure right now when I will have time to rework this PR, so I'm converting it to draft for the moment.
Hi,
Thanks for this initial works on Brevitas conversion, and I think there is an issue.
I've tested the QuantLinear layer with 8-bit weight and 8-bit input_quant and expect (8-bit, 8-bit) multiplication in the linear layer, however, the converted model still uses the default quantizer for the input_quant. The following code is modified from the provided pytest unit case.
import brevitas.nn as qnn
import torch
import hls4ml
from brevitas.quant import Int8WeightPerTensorFixedPoint, Int8ActPerTensorFixedPoint
from torch import nn
from torch.nn import Module
from hls4ml.converters import convert_from_pytorch_model
from hls4ml.utils.config import config_from_pytorch_model
class QuantModelLinear(Module):
def __init__(self):
super().__init__()
self.conv1 = qnn.QuantLinear(16, 16, bias=True,
weight_quant=Int8WeightPerTensorFixedPoint, input_quant=Int8ActPerTensorFixedPoint)
self.relu1 = qnn.QuantReLU()
def forward(self, x):
out = self.relu1(self.conv1(x))
return out
def test_quantlinear(backend, io_type):
model = QuantModelLinear()
x = torch.rand([1,16])
pytorch_prediction = model(x).detach().numpy()
config = config_from_pytorch_model(model, input_shape=(None, 16))
output_dir = str(f'hls4mlprj_brevitas_linear_{backend}_{io_type}')
hls_model = convert_from_pytorch_model(
model,
hls_config=config,
output_dir=output_dir,
backend=backend,
io_type=io_type,
part='xcu50-fsvh2104-2-e'
)
hls_model.compile()
# hls_prediction = np.reshape(hls_model.predict(x.detach().numpy()), pytorch_prediction.shape)
# np.testing.assert_allclose(hls_prediction, pytorch_prediction, rtol=0.0, atol=0.05)
return hls_model
hls_model = test_quantlinear("Vitis", "io_stream")
hls4ml.utils.plot_model(hls_model, show_shapes=True, show_precision=True, to_file=None)
The following is the computational graph of converted model, and we can see that the multiplication of linear layer would be 8-bit(weight) multiply 16-bit(activation), which is different from the Brevitas setting.
Hi @JiaMingLin Thanks for testing out this feature. I have been looking at it and I am beginning to realize that I have overlooked some complications with parsing from brevitas to hls4ml. I will convert this to a draft and try to find more comprehensive solutions.
We now support the quantization of input and output tensors of QuantLinear, QuantConv1D, and QuantConv2D and QuantActivation layers. Only quantization with scales equaling powers of 2 are supported at the moment.
Hey @jmitrevs I can't reproduce the oneAPI compilation error in the test_brevitas_parsing pytest locally for some reason. I'm suspecting it's genuine, but it's hard to tell because locally it compiles fine. I have oneAPI version 2025.0 installed. Could you see if you can reproduce this?
To summarize the current state of this: I think this PR contains what we can reasonably support currently and most things work somewhat well. Everything is limited to power-of-2 scales,but I think that's fine for now. The big issue that I still have is QuantLSTM. Brevitas completely reimplements RNNs and re-quantizes frequently internally. I tried to match that by setting the precision of the internally used variables more granular than we currently do, but the precision is horrible. I am tempted to remove it for the moment. Currently, you can't even export QuantLSTM to QONNX, so this is something that brevitas itself is still figuring out.
Removed QuantLSTM for now. QuantRNN is also not great, but at least usually gets results in the same ballpark. Otherwise I think this is ready for review.
pre-commit.ci autofix
pre-commit.ci autofix