VGG-16 implementation based on hls4ml

Open lloo099 opened this issue 3 years ago • 1 comments

Hi, I use your hls4ml lib to implement VGG16 on Cifar-10 dataset. The project can passed synthesis, but syn reports too many II vilation. Another problem is that cannot load weight correctly. Could you help me with that myproject_csim.log

`
+ Performance & Resource Estimates: 
    
    PS: '+' for module; 'o' for loop; '*' for dataflow
    +--------------------------------------------------------------------+------+-------+---------+-----------+----------+---------+------+----------+----------+----------+---------------+---------------+-----+
    |                               Modules                              | Issue|       | Latency |  Latency  | Iteration|         | Trip |          |          |          |               |               |     |
    |                               & Loops                              | Type | Slack | (cycles)|    (ns)   |  Latency | Interval| Count| Pipelined|   BRAM   |    DSP   |       FF      |      LUT      | URAM|
    +--------------------------------------------------------------------+------+-------+---------+-----------+----------+---------+------+----------+----------+----------+---------------+---------------+-----+
    |+ myproject*                                                        |     -|  -0.29|   749234|  3.746e+06|         -|   747522|     -|  dataflow|  95 (21%)|  76 (21%)|  161885 (114%)|  206619 (292%)|    -|
    | + grp_pooling2d_cl_array_array_ap_fixed_2u_maxconfig10_s_fu_19242  |     -|  -0.29|     8241|  4.120e+04|         -|     8241|     -|        no|         -|         -|    74629 (52%)|   85947 (121%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_maxconfig10_s_fu_10364  |    II|  -2.03|      256|  1.280e+03|         -|      256|     -|       yes|         -|         -|    41138 (29%)|    29721 (42%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |     -|  -3.65|     8240|  4.120e+04|       515|        -|    16|        no|         -|         -|              -|              -|    -|
    | + grp_pooling2d_cl_array_array_ap_fixed_2u_maxconfig7_s_fu_24376   |     -|  -0.29|     8451|  4.226e+04|         -|     8451|     -|        no|         -|         -|    41328 (29%)|    42561 (60%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_maxconfig7_s_fu_5244    |    II|  -2.03|      128|    640.000|         -|      128|     -|       yes|         -|         -|    20530 (14%)|    14854 (21%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |    II|  -3.65|     8449|  4.224e+04|       260|      130|    64|       yes|         -|         -|              -|              -|    -|
    | + grp_conv_2d_cl_array_array_ap_fixed_2u_config1_s_fu_26950        |     -|  -0.33|   747521|  3.738e+06|         -|   747521|     -|        no|  63 (14%)|  64 (17%)|    14395 (10%)|     8040 (11%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_1u_config1_s_fu_30639      |    II|  -1.21|        2|     10.000|         -|        2|     -|       yes|         -|         -|      404 (~0%)|      294 (~0%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |     -|  -3.65|   747520|  3.738e+06|       730|        -|  1024|        no|         -|         -|              -|              -|    -|
    |   o Product1                                                       |     -|  -3.65|       12|     60.000|         5|        1|     9|       yes|         -|         -|              -|              -|    -|
    |   o ResetAccum                                                     |     -|  -3.65|       64|    320.000|         1|        1|    64|       yes|         -|         -|              -|              -|    -|
    |   o Accum1_Accum2                                                  |     -|  -3.65|      577|  2.885e+03|         2|        1|   576|       yes|         -|         -|              -|              -|    -|
    |   o Result                                                         |     -|  -3.65|       64|    320.000|         1|        1|    64|       yes|         -|         -|              -|              -|    -|
    | + grp_pooling2d_cl_array_array_ap_fixed_2u_maxconfig4_s_fu_26997   |     -|  -0.29|    16963|  8.482e+04|         -|    16963|     -|        no|         -|         -|    20724 (14%)|    21133 (29%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_maxconfig4_s_fu_2684    |    II|  -2.03|       64|    320.000|         -|       64|     -|       yes|         -|         -|     10226 (7%)|     7272 (10%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |    II|  -3.65|    16961|  8.480e+04|       132|       66|   256|       yes|         -|         -|              -|              -|    -|
    | + grp_conv_2d_cl_array_array_ap_fixed_2u_config13_s_fu_28291       |     -|  -0.52|     1053|  5.265e+03|         -|     1053|     -|        no|         -|   1 (~0%)|      788 (~0%)|      6065 (8%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_config13_s_fu_1159      |    II|  -2.03|      256|  1.280e+03|         -|      256|     -|       yes|         -|         -|      338 (~0%)|      4153 (5%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |     -|  -3.65|     1052|  5.260e+03|       263|        -|     4|        no|         -|         -|              -|              -|    -|
    | + grp_conv_2d_cl_array_array_ap_fixed_2u_config9_s_fu_29334        |     -|  -0.52|     4193|  2.096e+04|         -|     4193|     -|        no|         -|   1 (~0%)|      792 (~0%)|      6072 (8%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_config9_s_fu_1155       |    II|  -2.03|      256|  1.280e+03|         -|      256|     -|       yes|         -|         -|      338 (~0%)|      4153 (5%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |     -|  -3.65|     4192|  2.096e+04|       262|        -|    16|        no|         -|         -|              -|              -|    -|
    | + grp_conv_2d_cl_array_array_ap_fixed_2u_config10_s_fu_30376       |     -|  -0.52|     4193|  2.096e+04|         -|     4193|     -|        no|         -|   1 (~0%)|      792 (~0%)|      6072 (8%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_config10_s_fu_1155      |    II|  -2.03|      256|  1.280e+03|         -|      256|     -|       yes|         -|         -|      338 (~0%)|      4153 (5%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |     -|  -3.65|     4192|  2.096e+04|       262|        -|    16|        no|         -|         -|              -|              -|    -|
    | + grp_conv_2d_cl_array_array_ap_fixed_2u_config11_s_fu_31418       |     -|  -0.52|     1049|  5.245e+03|         -|     1049|     -|        no|         -|   1 (~0%)|      788 (~0%)|      6069 (8%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_config11_s_fu_1155      |    II|  -2.03|      256|  1.280e+03|         -|      256|     -|       yes|         -|         -|      338 (~0%)|      4153 (5%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |     -|  -3.65|     1048|  5.240e+03|       262|        -|     4|        no|         -|         -|              -|              -|    -|
    | + grp_conv_2d_cl_array_array_ap_fixed_2u_config12_s_fu_32460       |     -|  -0.52|     1049|  5.245e+03|         -|     1049|     -|        no|         -|   1 (~0%)|      788 (~0%)|      6069 (8%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_config12_s_fu_1155      |    II|  -2.03|      256|  1.280e+03|         -|      256|     -|       yes|         -|         -|      338 (~0%)|      4153 (5%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |     -|  -3.65|     1048|  5.240e+03|       262|        -|     4|        no|         -|         -|              -|              -|    -|
    | + grp_conv_2d_cl_array_array_ap_fixed_2u_config6_s_fu_33502        |     -|  -0.52|     8326|  4.163e+04|         -|     8326|     -|        no|         -|   1 (~0%)|      624 (~0%)|      3362 (4%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_config6_s_fu_1163       |    II|  -1.21|      128|    640.000|         -|      128|     -|       yes|         -|         -|      290 (~0%)|      2118 (3%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |    II|  -3.65|     8324|  4.162e+04|       135|      130|    64|       yes|         -|         -|              -|              -|    -|
    | + grp_conv_2d_cl_array_array_ap_fixed_2u_config7_s_fu_34546        |     -|  -0.52|     8326|  4.163e+04|         -|     8326|     -|        no|         -|   1 (~0%)|      624 (~0%)|      3362 (4%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_config7_s_fu_1163       |    II|  -1.21|      128|    640.000|         -|      128|     -|       yes|         -|         -|      290 (~0%)|      2118 (3%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |    II|  -3.65|     8324|  4.162e+04|       135|      130|    64|       yes|         -|         -|              -|              -|    -|
    | + grp_conv_2d_cl_array_array_ap_fixed_2u_config8_s_fu_35590        |     -|  -0.52|     2086|  1.043e+04|         -|     2086|     -|        no|         -|   1 (~0%)|      620 (~0%)|      3359 (4%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_config8_s_fu_1163       |    II|  -1.21|      128|    640.000|         -|      128|     -|       yes|         -|         -|      290 (~0%)|      2118 (3%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |    II|  -3.65|     2084|  1.042e+04|       135|      130|    16|       yes|         -|         -|              -|              -|    -|
    | + grp_conv_2d_cl_array_array_ap_fixed_2u_config5_s_fu_36634        |     -|  -0.52|     4231|  2.116e+04|         -|     4231|     -|        no|         -|   1 (~0%)|      512 (~0%)|      1835 (2%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_config5_s_fu_653        |    II|  -1.21|       64|    320.000|         -|       64|     -|       yes|         -|         -|      226 (~0%)|       936 (1%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |    II|  -3.65|     4229|  2.114e+04|        72|       66|    64|       yes|         -|         -|              -|              -|    -|
    | + grp_conv_2d_cl_array_array_ap_fixed_2u_config4_s_fu_37166        |     -|  -0.52|    16902|  8.451e+04|         -|    16902|     -|        no|         -|   1 (~0%)|      500 (~0%)|      1814 (2%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_config4_s_fu_649        |    II|  -1.21|       64|    320.000|         -|       64|     -|       yes|         -|         -|      226 (~0%)|       936 (1%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |    II|  -3.65|    16900|  8.450e+04|        71|       66|   256|       yes|         -|         -|              -|              -|    -|
    | + grp_conv_2d_cl_array_array_ap_fixed_2u_config2_s_fu_37698        |     -|  -0.52|    34822|  1.741e+05|         -|    34822|     -|        no|         -|   1 (~0%)|      440 (~0%)|      1256 (1%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_config2_s_fu_395        |    II|  -1.21|       32|    160.000|         -|       32|     -|       yes|         -|         -|      194 (~0%)|      540 (~0%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |    II|  -3.65|    34820|  1.741e+05|        39|       34|  1024|       yes|         -|         -|              -|              -|    -|
    | + grp_conv_2d_cl_array_array_ap_fixed_2u_config3_s_fu_37974        |     -|  -0.52|     8710|  4.355e+04|         -|     8710|     -|        no|         -|   1 (~0%)|      436 (~0%)|      1253 (1%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_config3_s_fu_393        |    II|  -1.21|       32|    160.000|         -|       32|     -|       yes|         -|         -|      194 (~0%)|      540 (~0%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |    II|  -3.65|     8708|  4.354e+04|        39|       34|   256|       yes|         -|         -|              -|              -|    -|
    | + grp_pooling2d_cl_array_array_ap_fixed_2u_maxconfig2_s_fu_38250   |     -|  -0.52|    34820|  1.741e+05|         -|    34820|     -|        no|         -|         -|      568 (~0%)|      1420 (2%)|    -|
    |  + grp_shift_line_buffer_array_ap_fixed_2u_maxconfig2_s_fu_261     |    II|  -2.03|       32|    160.000|         -|       32|     -|       yes|         -|         -|      178 (~0%)|      576 (~0%)|    -|
    |  o ReadInputHeight_ReadInputWidth                                  |    II|  -3.65|    34818|  1.741e+05|        37|       34|  1024|       yes|         -|         -|              -|              -|    -|
    | + call_ln0_Block_split1_proc_fu_38408                              |     -|  -3.65|        0|      0.000|         -|        0|     -|        no|         -|         -|        5 (~0%)|       29 (~0%)|    -|
    +--------------------------------------------------------------------+------+-------+---------+-----------+----------+---------+------+----------+----------+----------+---------------+---------------+-----+

`

Here are my main codec:

#include <iostream>

#include "myproject.h"
#include "parameters.h"

void myproject(
    hls::stream<input_t> &input1,
    hls::stream<layer2_t> &maxlayer13_out,
    unsigned short &const_size_in_1,
    unsigned short &const_size_out_1
) {

    //hls-fpga-machine-learning insert IO
    #pragma HLS INTERFACE axis port=input1,maxlayer13_out 
    #pragma HLS DATAFLOW 

    const_size_in_1 = N_INPUT_1_1*N_INPUT_2_1*N_INPUT_3_1;
    const_size_out_1 = N_LAYER_5;

#ifndef __SYNTHESIS__
    static bool loaded_weights = false;
    if (!loaded_weights) {
        //hls-fpga-machine-learning insert load weights
        nnet::load_weights_from_txt<model_default_t, 18>(w2, "w2.txt");
        nnet::load_weights_from_txt<model_default_t, 4>(w3, "w3.txt");
        nnet::load_weights_from_txt<model_default_t, 1280>(w5, "w5.txt");

        nnet::load_weights_from_txt<model_default_t, 1728>(vgg16w1, "vgg16w1.txt");
        nnet::load_weights_from_txt<model_default_t, 64>(b1, "b1.txt");
        nnet::load_weights_from_txt<model_default_t, 36864>(vgg16w2, "vgg16w2.txt");
        nnet::load_weights_from_txt<model_default_t, 64>(b2, "b2.txt");
        nnet::load_weights_from_txt<model_default_t, 73728>(vgg16w3, "vgg16w3.txt");
        nnet::load_weights_from_txt<model_default_t, 128>(b3, "b3.txt");
        nnet::load_weights_from_txt<model_default_t, 147584>(vgg16w4, "vgg16w4.txt");
        nnet::load_weights_from_txt<model_default_t, 128>(b4, "b4.txt");
        nnet::load_weights_from_txt<model_default_t, 294912>(vgg16w5, "vgg16w5.txt");
        nnet::load_weights_from_txt<model_default_t, 256>(b5, "b5.txt");
        nnet::load_weights_from_txt<model_default_t, 589824>(vgg16w6, "vgg16w6.txt");
        nnet::load_weights_from_txt<model_default_t, 256>(b6, "b6.txt");
        nnet::load_weights_from_txt<model_default_t, 589824>(vgg16w7, "vgg16w7.txt");
        nnet::load_weights_from_txt<model_default_t, 256>(b7, "b7.txt");
        nnet::load_weights_from_txt<model_default_t, 1179648>(vgg16w8, "vgg16w8.txt");
        nnet::load_weights_from_txt<model_default_t, 512>(b8, "b8.txt");
        nnet::load_weights_from_txt<model_default_t, 2359296>(vgg16w9, "vgg16w9.txt");
        nnet::load_weights_from_txt<model_default_t, 512>(b9, "b9.txt");
        nnet::load_weights_from_txt<model_default_t, 2359296>(vgg16w10, "vgg16w10.txt");
        nnet::load_weights_from_txt<model_default_t, 512>(b10, "b10.txt");
        nnet::load_weights_from_txt<model_default_t, 2359296>(vgg16w11, "vgg16w11.txt");
        nnet::load_weights_from_txt<model_default_t, 512>(b11, "b11.txt");
       // nnet::load_weights_from_txt<model_default_t, 2359296>(vgg16w12, "vgg16w12.txt");
        nnet::load_weights_from_txt<model_default_t, 512>(b12, "b12.txt");
       // nnet::load_weights_from_txt<model_default_t, 2359296>(vgg16w13, "vgg16w13.txt");
        nnet::load_weights_from_txt<model_default_t, 512>(b13, "b13.txt");
     

        loaded_weights = true;
    }
#endif

    // ****************************************
    // NETWORK INSTANTIATION
    // ****************************************

    //hls-fpga-machine-learning insert layers
    //conv1
    hls::stream<layer2_t> layer1_out("layer1_out");
    #pragma HLS STREAM variable=layer1_out depth=64
    nnet::conv_2d_cl<input_t, layer2_t, config1>(input1, layer1_out, vgg16w1, b1); // conv2d_1

    //conv2
    hls::stream<layer2_t> layer2_out("layer2_out");
    #pragma HLS STREAM variable=layer2_out depth=64
    nnet::conv_2d_cl<layer2_t, layer2_t, config2>(layer1_out, layer2_out, vgg16w2, b2); // conv2d_1

    hls::stream<layer2_t> maxlayer2_out("maxlayer2_out");
    #pragma HLS STREAM variable=maxlayer2_out depth=64
    nnet::pooling2d_cl<layer2_t, layer2_t, maxconfig2>(layer2_out, maxlayer2_out); // max_pooling2d_1

    //conv3
    hls::stream<layer2_t> layer3_out("layer3_out");
    #pragma HLS STREAM variable=layer3_out depth=128
    nnet::conv_2d_cl<layer2_t, layer2_t, config3>(maxlayer2_out, layer3_out, vgg16w3, b3); // conv2d_1

    //conv4
    hls::stream<layer2_t> layer4_out("layer4_out");
    #pragma HLS STREAM variable=layer4_out depth=128
    nnet::conv_2d_cl<layer2_t, layer2_t, config4>(layer3_out, layer4_out, vgg16w4, b4); // conv2d_1

    hls::stream<layer2_t> maxlayer4_out("maxlayer4_out");
    #pragma HLS STREAM variable=maxlayer4_out depth=128
    nnet::pooling2d_cl<layer2_t, layer2_t, maxconfig4>(layer4_out, maxlayer4_out); // max_pooling2d_1

    //conv5
    hls::stream<layer2_t> layer5_out("layer5_out");
    #pragma HLS STREAM variable=layer5_out depth=256
    nnet::conv_2d_cl<layer2_t, layer2_t, config5>(maxlayer4_out, layer5_out, vgg16w5, b5); // conv2d_1

    //conv6
    hls::stream<layer2_t> layer6_out("layer6_out");
    #pragma HLS STREAM variable=layer6_out depth=256
    nnet::conv_2d_cl<layer2_t, layer2_t, config6>(layer5_out, layer6_out, vgg16w6, b6); // conv2d_1


    //conv7
    hls::stream<layer2_t> layer7_out("layer7_out");
    #pragma HLS STREAM variable=layer7_out depth=256
    nnet::conv_2d_cl<layer2_t, layer2_t, config7>(layer6_out, layer7_out, vgg16w7, b7); // conv2d_1

    hls::stream<layer2_t> maxlayer7_out("maxlayer7_out");
    #pragma HLS STREAM variable=maxlayer7_out depth=256
    nnet::pooling2d_cl<layer2_t, layer2_t, maxconfig7>(layer7_out, maxlayer7_out); // max_pooling2d_1


    //conv8
    hls::stream<layer2_t> layer8_out("layer8_out");
    #pragma HLS STREAM variable=layer8_out depth=256
    nnet::conv_2d_cl<layer2_t, layer2_t, config8>(maxlayer7_out, layer8_out, vgg16w8, b8); // conv2d_1


    //conv9
    hls::stream<layer2_t> layer9_out("layer9_out");
    #pragma HLS STREAM variable=layer9_out depth=256
    nnet::conv_2d_cl<layer2_t, layer2_t, config9>(layer8_out, layer9_out, vgg16w9, b9); // conv2d_1

    //conv10
    hls::stream<layer2_t> layer10_out("layer10_out");
    #pragma HLS STREAM variable=layer10_out depth=256
    nnet::conv_2d_cl<layer2_t, layer2_t, config10>(layer9_out, layer10_out, vgg16w10, b10); // conv2d_1

    hls::stream<layer2_t> maxlayer10_out("maxlayer10_out");
    #pragma HLS STREAM variable=maxlayer10_out depth=256
    nnet::pooling2d_cl<layer2_t, layer2_t, maxconfig10>(layer10_out, maxlayer10_out); // max_pooling2d_1

    //conv11
    hls::stream<layer2_t> layer11_out("layer11_out");
    #pragma HLS STREAM variable=layer11_out depth=512
    nnet::conv_2d_cl<layer2_t, layer2_t, config11>(maxlayer10_out, layer11_out, vgg16w11, b11); // conv2d_1

    //conv12
    hls::stream<layer2_t> layer12_out("layer12_out");
    #pragma HLS STREAM variable=layer12_out depth=512
    nnet::conv_2d_cl<layer2_t, layer2_t, config12>(layer11_out, layer12_out, vgg16w11, b12); // conv2d_1


    //conv13
   // hls::stream<layer2_t> layer13_out("layer13_out");
   // #pragma HLS STREAM variable=layer13_out depth=512
    nnet::conv_2d_cl<layer2_t, layer2_t, config13>(layer12_out, maxlayer13_out, vgg16w11, b13); // conv2d_1


 //   nnet::pooling2d_cl<layer2_t, layer2_t, maxconfig13>(layer13_out, maxlayer13_out); // max_pooling2d_1




}

Apr 29 '22 16:04 lloo099

Hi @lloo099, I was trying to synthesize VGG16 lately but I keep getting "partitioned elements number has exeeded the threshold" error despite using a very large reuse factor and Resource strategy. Could you please share the hls4ml config you used to synthesize VGG16? Thanks!

Aug 30 '22 23:08 hmarkc