Description

Adds support for image-related layers (Conv 1D & 2D, Avg & Max Pooling, Global Pooling, Zero Padding, Upsampling) in io_stream in a similar manner to Vivado

Conv 1D & 2D implemented using line buffer, similar to Vivado. Main difference is in the implementation of padding for Conv layers - Vivado inserts a padding layer; Quartus performs padding in the Conv layer. This approach stays in line with the Keras model graph and the total number of layers.

Same padding is not supported for Pooling layers.

Written a custom struct to act as a shift register in hardware (Intel HLS does not offer an out-of-the-box shift register). However, any struct with a similar implementation (and meeting certain time / loop requirements) will be synthesised as a shift register. This can be verified by viewing the synthesis report in report.html > Area Analysis of System

Upsampling and Zero Padding layers written in a largely similar way to Vivado

Resource usage and latency results coming soon.

Transpose layer to be added soon.

Bug fix introduced by PR #561 for parallel transpose layers

It is recommended to review this PR commit by commit, as each commit adds a single piece of functionality, is self-contained and the project can be compiled individually

Type of change

[x] Bug fix (non-breaking change that fixes an issue)
[x] New feature (non-breaking change which adds functionality)

Tests

All of the existing tests were expanded to include tests for Quartus in io_stream. No new tests were written. A summary of the tests is given below.

test_keras_api.py - Ensures correct parsing of the layers in io_stream and correct syntax (no compilation errors) of Conv 1D & Conv 2D layers.

test_cnn_mnist.py, test_cnn_mnist_qkeras.py, test_conv1d.py - Verify the numerical accuracy and compilation of Conv 1D, Conv 2D, Max & Avg Pooling layers.

test_upsampling.py and test_zeropadding.py - Ensures numerical accuracy and successful compilation of Zero Padding and Upsampling layers.

test_globalpooling.py Ensures numerical accuracy and successful compilation of Global Pooling layers.

Synthesis results

Below are results obtained through full Quartus synthesis of Conv2D layers for a fixed input (32x32x3) when varying the number of filters and the reuse factors. Other layers were tested for correct synthesis.

Checklist

[x] I have read the guidelines for contributing.
[x] I have commented my code, particularly in hard-to-understand areas.
[ ] I have made corresponding changes to the documentation.
[x] My changes generate no new warnings.
[x] I have added tests that prove my fix is effective or that my feature works.B

Sep 20 '22 15:09 bo3z

pytest.activations is failing:

E       AssertionError: 
E       Not equal to tolerance rtol=0.02, atol=0.02
E       
E       Mismatched elements: 8000 / 8000 (100%)
E       Max absolute difference: 1.12238881
E       Max relative difference: 8914.97600568
E        x: array([[0.793945, 0.791992, 0.798828, ..., 0.804688, 0.791016, 0.799805],
E              [0.791016, 0.802734, 0.804688, ..., 0.799805, 0.799805, 0.794922],
E              [0.795898, 0.808594, 0.803711, ..., 0.793945, 0.796875, 0.801758],...
E        y: array([[-0.227973, -0.279667, -0.045713, ...,  0.226889, -0.28958 ,
E                0.031885],
E              [-0.292061,  0.154492,  0.214236, ...,  0.041079, -0.003215,...
test_activations.py:55: AssertionError

Can you see why?

Oct 05 '22 14:10 jmitrevs

pytest.activations is failing:

E       AssertionError: 
E       Not equal to tolerance rtol=0.02, atol=0.02
E       
E       Mismatched elements: 8000 / 8000 (100%)
E       Max absolute difference: 1.12238881
E       Max relative difference: 8914.97600568
E        x: array([[0.793945, 0.791992, 0.798828, ..., 0.804688, 0.791016, 0.799805],
E              [0.791016, 0.802734, 0.804688, ..., 0.799805, 0.799805, 0.794922],
E              [0.795898, 0.808594, 0.803711, ..., 0.793945, 0.796875, 0.801758],...
E        y: array([[-0.227973, -0.279667, -0.045713, ...,  0.226889, -0.28958 ,
E                0.031885],
E              [-0.292061,  0.154492,  0.214236, ...,  0.041079, -0.003215,...
test_activations.py:55: AssertionError

Can you see why?

This was addressed in a PR #655 that was already merged. It comes from the fact that the parallel Softsign was optimised in #585, by removing unnecessary values in the LUT but required changes in logic.

Oct 07 '22 10:10 bo3z

It generally looks good to me so I approved it. I sort of wanted to trigger the pytests again, but couldn't figure out how.

Oct 10 '22 18:10 jmitrevs

I can merge it later today unless someone wants to check more.

Oct 10 '22 18:10 jmitrevs

I need some more time to go through this.

Oct 10 '22 18:10 vloncar

@jmitrevs All the issues have been resolved. Do you want to take another pass at this or we merge it?

Nov 09 '22 23:11 vloncar

Using a slightly older branch, I noticed that in a project I created the using stream definition is in both defines.h and nnet_helpers.h. Is that still the case and needed? (I was hacking the definition in one and I got an error that the two definitions didn't match.

Nov 10 '22 18:11 jmitrevs

I removed the definitions from nnet_helpers.h. All tests (python compile, make and quartus compile) pass.

The only issue remaining with this PR is that occasionally the padding routines don't work with a cryptic error from the compiler: Compiler Error: Multiple reflexive accesses from stream 'layer2_out' is not allowed. This happens for ZeroPadding1D/2D and Conv1D/2D (with same padding) under certain scenarios. This still needs some understanding, potentially with help from Intel, so I wouldn't block the merge of this just because of that. @jmitrevs?

Nov 12 '22 00:11 vloncar

Just for completeness, this alternate unoptimized 1d padding implementation does not suffer the error:

template<class data_T, class res_T, typename CONFIG_T>
void zeropad1d_cl(stream<data_T> &data, stream<res_T> &res) {

    res_T res_array[CONFIG_T::out_width];

    ZeroOutputArray:
    for (int i = 0; i < CONFIG_T::out_width; i++) {
        for (int j = 0; j < CONFIG_T::n_chan; j++) {
            res_array[i][j] = 0;
        }
    }

    CopyMain:
    for (int i = 0; i < CONFIG_T::in_width; i++) {
        auto dataval = data.read();
        for (int j = 0; j < CONFIG_T::n_chan; j++) {
            res_array[i+CONFIG_T::pad_left][j] = dataval[j];
        }
    }

    StreamOut:
    for (int i = 0; i < CONFIG_T::out_width; i++) {
        res.write(res_array[i]);
    }
}

Nevertheless, why what we have fails is not clear to me. I'll leave some time for comments, but if no one objects, we can merge this weekend.

Nov 12 '22 01:11 jmitrevs

Quartus Streaming Conv, Pooling & Image layers

Description

Type of change

Tests

Synthesis results

Checklist