Quartus Streaming Conv, Pooling & Image layers
Description
- Adds support for image-related layers (Conv 1D & 2D, Avg & Max Pooling, Global Pooling, Zero Padding, Upsampling) in
io_streamin a similar manner to Vivado- Conv 1D & 2D implemented using line buffer, similar to Vivado. Main difference is in the implementation of padding for Conv layers - Vivado inserts a padding layer; Quartus performs padding in the Conv layer. This approach stays in line with the Keras model graph and the total number of layers.
- Same padding is not supported for Pooling layers.
- Written a custom struct to act as a shift register in hardware (Intel HLS does not offer an out-of-the-box shift register). However, any struct with a similar implementation (and meeting certain time / loop requirements) will be synthesised as a shift register. This can be verified by viewing the synthesis report in report.html > Area Analysis of System
- Upsampling and Zero Padding layers written in a largely similar way to Vivado
- Resource usage and latency results coming soon.
- Transpose layer to be added soon.
- Bug fix introduced by PR #561 for parallel transpose layers
- It is recommended to review this PR commit by commit, as each commit adds a single piece of functionality, is self-contained and the project can be compiled individually
Type of change
- [x] Bug fix (non-breaking change that fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
Tests
All of the existing tests were expanded to include tests for Quartus in io_stream. No new tests were written. A summary of the tests is given below.
test_keras_api.py- Ensures correct parsing of the layers inio_streamand correct syntax (no compilation errors) of Conv 1D & Conv 2D layers.test_cnn_mnist.py,test_cnn_mnist_qkeras.py,test_conv1d.py- Verify the numerical accuracy and compilation of Conv 1D, Conv 2D, Max & Avg Pooling layers.test_upsampling.pyandtest_zeropadding.py- Ensures numerical accuracy and successful compilation of Zero Padding and Upsampling layers.test_globalpooling.pyEnsures numerical accuracy and successful compilation of Global Pooling layers.
Synthesis results
Below are results obtained through full Quartus synthesis of Conv2D layers for a fixed input (32x32x3) when varying the number of filters and the reuse factors. Other layers were tested for correct synthesis.

Checklist
- [x] I have read the guidelines for contributing.
- [x] I have commented my code, particularly in hard-to-understand areas.
- [ ] I have made corresponding changes to the documentation.
- [x] My changes generate no new warnings.
- [x] I have added tests that prove my fix is effective or that my feature works.B
pytest.activations is failing:
E AssertionError:
E Not equal to tolerance rtol=0.02, atol=0.02
E
E Mismatched elements: 8000 / 8000 (100%)
E Max absolute difference: 1.12238881
E Max relative difference: 8914.97600568
E x: array([[0.793945, 0.791992, 0.798828, ..., 0.804688, 0.791016, 0.799805],
E [0.791016, 0.802734, 0.804688, ..., 0.799805, 0.799805, 0.794922],
E [0.795898, 0.808594, 0.803711, ..., 0.793945, 0.796875, 0.801758],...
E y: array([[-0.227973, -0.279667, -0.045713, ..., 0.226889, -0.28958 ,
E 0.031885],
E [-0.292061, 0.154492, 0.214236, ..., 0.041079, -0.003215,...
test_activations.py:55: AssertionError
Can you see why?
pytest.activations is failing:
E AssertionError: E Not equal to tolerance rtol=0.02, atol=0.02 E E Mismatched elements: 8000 / 8000 (100%) E Max absolute difference: 1.12238881 E Max relative difference: 8914.97600568 E x: array([[0.793945, 0.791992, 0.798828, ..., 0.804688, 0.791016, 0.799805], E [0.791016, 0.802734, 0.804688, ..., 0.799805, 0.799805, 0.794922], E [0.795898, 0.808594, 0.803711, ..., 0.793945, 0.796875, 0.801758],... E y: array([[-0.227973, -0.279667, -0.045713, ..., 0.226889, -0.28958 , E 0.031885], E [-0.292061, 0.154492, 0.214236, ..., 0.041079, -0.003215,... test_activations.py:55: AssertionErrorCan you see why?
This was addressed in a PR #655 that was already merged. It comes from the fact that the parallel Softsign was optimised in #585, by removing unnecessary values in the LUT but required changes in logic.
It generally looks good to me so I approved it. I sort of wanted to trigger the pytests again, but couldn't figure out how.
I can merge it later today unless someone wants to check more.
I need some more time to go through this.
@jmitrevs All the issues have been resolved. Do you want to take another pass at this or we merge it?
Using a slightly older branch, I noticed that in a project I created the using stream definition is in both defines.h and nnet_helpers.h. Is that still the case and needed? (I was hacking the definition in one and I got an error that the two definitions didn't match.
I removed the definitions from nnet_helpers.h. All tests (python compile, make and quartus compile) pass.
The only issue remaining with this PR is that occasionally the padding routines don't work with a cryptic error from the compiler: Compiler Error: Multiple reflexive accesses from stream 'layer2_out' is not allowed. This happens for ZeroPadding1D/2D and Conv1D/2D (with same padding) under certain scenarios. This still needs some understanding, potentially with help from Intel, so I wouldn't block the merge of this just because of that. @jmitrevs?
Just for completeness, this alternate unoptimized 1d padding implementation does not suffer the error:
template<class data_T, class res_T, typename CONFIG_T>
void zeropad1d_cl(stream<data_T> &data, stream<res_T> &res) {
res_T res_array[CONFIG_T::out_width];
ZeroOutputArray:
for (int i = 0; i < CONFIG_T::out_width; i++) {
for (int j = 0; j < CONFIG_T::n_chan; j++) {
res_array[i][j] = 0;
}
}
CopyMain:
for (int i = 0; i < CONFIG_T::in_width; i++) {
auto dataval = data.read();
for (int j = 0; j < CONFIG_T::n_chan; j++) {
res_array[i+CONFIG_T::pad_left][j] = dataval[j];
}
}
StreamOut:
for (int i = 0; i < CONFIG_T::out_width; i++) {
res.write(res_array[i]);
}
}
Nevertheless, why what we have fails is not clear to me. I'll leave some time for comments, but if no one objects, we can merge this weekend.