hls4ml
hls4ml copied to clipboard
VGG-16 implementation based on hls4ml
Hi, I use your hls4ml lib to implement VGG16 on Cifar-10 dataset. The project can passed synthesis, but syn reports too many II vilation. Another problem is that cannot load weight correctly. Could you help me with that myproject_csim.log
`
+ Performance & Resource Estimates:
PS: '+' for module; 'o' for loop; '*' for dataflow
+--------------------------------------------------------------------+------+-------+---------+-----------+----------+---------+------+----------+----------+----------+---------------+---------------+-----+
| Modules | Issue| | Latency | Latency | Iteration| | Trip | | | | | | |
| & Loops | Type | Slack | (cycles)| (ns) | Latency | Interval| Count| Pipelined| BRAM | DSP | FF | LUT | URAM|
+--------------------------------------------------------------------+------+-------+---------+-----------+----------+---------+------+----------+----------+----------+---------------+---------------+-----+
|+ myproject* | -| -0.29| 749234| 3.746e+06| -| 747522| -| dataflow| 95 (21%)| 76 (21%)| 161885 (114%)| 206619 (292%)| -|
| + grp_pooling2d_cl_array_array_ap_fixed_2u_maxconfig10_s_fu_19242 | -| -0.29| 8241| 4.120e+04| -| 8241| -| no| -| -| 74629 (52%)| 85947 (121%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_maxconfig10_s_fu_10364 | II| -2.03| 256| 1.280e+03| -| 256| -| yes| -| -| 41138 (29%)| 29721 (42%)| -|
| o ReadInputHeight_ReadInputWidth | -| -3.65| 8240| 4.120e+04| 515| -| 16| no| -| -| -| -| -|
| + grp_pooling2d_cl_array_array_ap_fixed_2u_maxconfig7_s_fu_24376 | -| -0.29| 8451| 4.226e+04| -| 8451| -| no| -| -| 41328 (29%)| 42561 (60%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_maxconfig7_s_fu_5244 | II| -2.03| 128| 640.000| -| 128| -| yes| -| -| 20530 (14%)| 14854 (21%)| -|
| o ReadInputHeight_ReadInputWidth | II| -3.65| 8449| 4.224e+04| 260| 130| 64| yes| -| -| -| -| -|
| + grp_conv_2d_cl_array_array_ap_fixed_2u_config1_s_fu_26950 | -| -0.33| 747521| 3.738e+06| -| 747521| -| no| 63 (14%)| 64 (17%)| 14395 (10%)| 8040 (11%)| -|
| + grp_shift_line_buffer_array_ap_fixed_1u_config1_s_fu_30639 | II| -1.21| 2| 10.000| -| 2| -| yes| -| -| 404 (~0%)| 294 (~0%)| -|
| o ReadInputHeight_ReadInputWidth | -| -3.65| 747520| 3.738e+06| 730| -| 1024| no| -| -| -| -| -|
| o Product1 | -| -3.65| 12| 60.000| 5| 1| 9| yes| -| -| -| -| -|
| o ResetAccum | -| -3.65| 64| 320.000| 1| 1| 64| yes| -| -| -| -| -|
| o Accum1_Accum2 | -| -3.65| 577| 2.885e+03| 2| 1| 576| yes| -| -| -| -| -|
| o Result | -| -3.65| 64| 320.000| 1| 1| 64| yes| -| -| -| -| -|
| + grp_pooling2d_cl_array_array_ap_fixed_2u_maxconfig4_s_fu_26997 | -| -0.29| 16963| 8.482e+04| -| 16963| -| no| -| -| 20724 (14%)| 21133 (29%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_maxconfig4_s_fu_2684 | II| -2.03| 64| 320.000| -| 64| -| yes| -| -| 10226 (7%)| 7272 (10%)| -|
| o ReadInputHeight_ReadInputWidth | II| -3.65| 16961| 8.480e+04| 132| 66| 256| yes| -| -| -| -| -|
| + grp_conv_2d_cl_array_array_ap_fixed_2u_config13_s_fu_28291 | -| -0.52| 1053| 5.265e+03| -| 1053| -| no| -| 1 (~0%)| 788 (~0%)| 6065 (8%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_config13_s_fu_1159 | II| -2.03| 256| 1.280e+03| -| 256| -| yes| -| -| 338 (~0%)| 4153 (5%)| -|
| o ReadInputHeight_ReadInputWidth | -| -3.65| 1052| 5.260e+03| 263| -| 4| no| -| -| -| -| -|
| + grp_conv_2d_cl_array_array_ap_fixed_2u_config9_s_fu_29334 | -| -0.52| 4193| 2.096e+04| -| 4193| -| no| -| 1 (~0%)| 792 (~0%)| 6072 (8%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_config9_s_fu_1155 | II| -2.03| 256| 1.280e+03| -| 256| -| yes| -| -| 338 (~0%)| 4153 (5%)| -|
| o ReadInputHeight_ReadInputWidth | -| -3.65| 4192| 2.096e+04| 262| -| 16| no| -| -| -| -| -|
| + grp_conv_2d_cl_array_array_ap_fixed_2u_config10_s_fu_30376 | -| -0.52| 4193| 2.096e+04| -| 4193| -| no| -| 1 (~0%)| 792 (~0%)| 6072 (8%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_config10_s_fu_1155 | II| -2.03| 256| 1.280e+03| -| 256| -| yes| -| -| 338 (~0%)| 4153 (5%)| -|
| o ReadInputHeight_ReadInputWidth | -| -3.65| 4192| 2.096e+04| 262| -| 16| no| -| -| -| -| -|
| + grp_conv_2d_cl_array_array_ap_fixed_2u_config11_s_fu_31418 | -| -0.52| 1049| 5.245e+03| -| 1049| -| no| -| 1 (~0%)| 788 (~0%)| 6069 (8%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_config11_s_fu_1155 | II| -2.03| 256| 1.280e+03| -| 256| -| yes| -| -| 338 (~0%)| 4153 (5%)| -|
| o ReadInputHeight_ReadInputWidth | -| -3.65| 1048| 5.240e+03| 262| -| 4| no| -| -| -| -| -|
| + grp_conv_2d_cl_array_array_ap_fixed_2u_config12_s_fu_32460 | -| -0.52| 1049| 5.245e+03| -| 1049| -| no| -| 1 (~0%)| 788 (~0%)| 6069 (8%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_config12_s_fu_1155 | II| -2.03| 256| 1.280e+03| -| 256| -| yes| -| -| 338 (~0%)| 4153 (5%)| -|
| o ReadInputHeight_ReadInputWidth | -| -3.65| 1048| 5.240e+03| 262| -| 4| no| -| -| -| -| -|
| + grp_conv_2d_cl_array_array_ap_fixed_2u_config6_s_fu_33502 | -| -0.52| 8326| 4.163e+04| -| 8326| -| no| -| 1 (~0%)| 624 (~0%)| 3362 (4%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_config6_s_fu_1163 | II| -1.21| 128| 640.000| -| 128| -| yes| -| -| 290 (~0%)| 2118 (3%)| -|
| o ReadInputHeight_ReadInputWidth | II| -3.65| 8324| 4.162e+04| 135| 130| 64| yes| -| -| -| -| -|
| + grp_conv_2d_cl_array_array_ap_fixed_2u_config7_s_fu_34546 | -| -0.52| 8326| 4.163e+04| -| 8326| -| no| -| 1 (~0%)| 624 (~0%)| 3362 (4%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_config7_s_fu_1163 | II| -1.21| 128| 640.000| -| 128| -| yes| -| -| 290 (~0%)| 2118 (3%)| -|
| o ReadInputHeight_ReadInputWidth | II| -3.65| 8324| 4.162e+04| 135| 130| 64| yes| -| -| -| -| -|
| + grp_conv_2d_cl_array_array_ap_fixed_2u_config8_s_fu_35590 | -| -0.52| 2086| 1.043e+04| -| 2086| -| no| -| 1 (~0%)| 620 (~0%)| 3359 (4%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_config8_s_fu_1163 | II| -1.21| 128| 640.000| -| 128| -| yes| -| -| 290 (~0%)| 2118 (3%)| -|
| o ReadInputHeight_ReadInputWidth | II| -3.65| 2084| 1.042e+04| 135| 130| 16| yes| -| -| -| -| -|
| + grp_conv_2d_cl_array_array_ap_fixed_2u_config5_s_fu_36634 | -| -0.52| 4231| 2.116e+04| -| 4231| -| no| -| 1 (~0%)| 512 (~0%)| 1835 (2%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_config5_s_fu_653 | II| -1.21| 64| 320.000| -| 64| -| yes| -| -| 226 (~0%)| 936 (1%)| -|
| o ReadInputHeight_ReadInputWidth | II| -3.65| 4229| 2.114e+04| 72| 66| 64| yes| -| -| -| -| -|
| + grp_conv_2d_cl_array_array_ap_fixed_2u_config4_s_fu_37166 | -| -0.52| 16902| 8.451e+04| -| 16902| -| no| -| 1 (~0%)| 500 (~0%)| 1814 (2%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_config4_s_fu_649 | II| -1.21| 64| 320.000| -| 64| -| yes| -| -| 226 (~0%)| 936 (1%)| -|
| o ReadInputHeight_ReadInputWidth | II| -3.65| 16900| 8.450e+04| 71| 66| 256| yes| -| -| -| -| -|
| + grp_conv_2d_cl_array_array_ap_fixed_2u_config2_s_fu_37698 | -| -0.52| 34822| 1.741e+05| -| 34822| -| no| -| 1 (~0%)| 440 (~0%)| 1256 (1%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_config2_s_fu_395 | II| -1.21| 32| 160.000| -| 32| -| yes| -| -| 194 (~0%)| 540 (~0%)| -|
| o ReadInputHeight_ReadInputWidth | II| -3.65| 34820| 1.741e+05| 39| 34| 1024| yes| -| -| -| -| -|
| + grp_conv_2d_cl_array_array_ap_fixed_2u_config3_s_fu_37974 | -| -0.52| 8710| 4.355e+04| -| 8710| -| no| -| 1 (~0%)| 436 (~0%)| 1253 (1%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_config3_s_fu_393 | II| -1.21| 32| 160.000| -| 32| -| yes| -| -| 194 (~0%)| 540 (~0%)| -|
| o ReadInputHeight_ReadInputWidth | II| -3.65| 8708| 4.354e+04| 39| 34| 256| yes| -| -| -| -| -|
| + grp_pooling2d_cl_array_array_ap_fixed_2u_maxconfig2_s_fu_38250 | -| -0.52| 34820| 1.741e+05| -| 34820| -| no| -| -| 568 (~0%)| 1420 (2%)| -|
| + grp_shift_line_buffer_array_ap_fixed_2u_maxconfig2_s_fu_261 | II| -2.03| 32| 160.000| -| 32| -| yes| -| -| 178 (~0%)| 576 (~0%)| -|
| o ReadInputHeight_ReadInputWidth | II| -3.65| 34818| 1.741e+05| 37| 34| 1024| yes| -| -| -| -| -|
| + call_ln0_Block_split1_proc_fu_38408 | -| -3.65| 0| 0.000| -| 0| -| no| -| -| 5 (~0%)| 29 (~0%)| -|
+--------------------------------------------------------------------+------+-------+---------+-----------+----------+---------+------+----------+----------+----------+---------------+---------------+-----+
`
- Here are my main codec:
#include <iostream>
#include "myproject.h"
#include "parameters.h"
void myproject(
hls::stream<input_t> &input1,
hls::stream<layer2_t> &maxlayer13_out,
unsigned short &const_size_in_1,
unsigned short &const_size_out_1
) {
//hls-fpga-machine-learning insert IO
#pragma HLS INTERFACE axis port=input1,maxlayer13_out
#pragma HLS DATAFLOW
const_size_in_1 = N_INPUT_1_1*N_INPUT_2_1*N_INPUT_3_1;
const_size_out_1 = N_LAYER_5;
#ifndef __SYNTHESIS__
static bool loaded_weights = false;
if (!loaded_weights) {
//hls-fpga-machine-learning insert load weights
nnet::load_weights_from_txt<model_default_t, 18>(w2, "w2.txt");
nnet::load_weights_from_txt<model_default_t, 4>(w3, "w3.txt");
nnet::load_weights_from_txt<model_default_t, 1280>(w5, "w5.txt");
nnet::load_weights_from_txt<model_default_t, 1728>(vgg16w1, "vgg16w1.txt");
nnet::load_weights_from_txt<model_default_t, 64>(b1, "b1.txt");
nnet::load_weights_from_txt<model_default_t, 36864>(vgg16w2, "vgg16w2.txt");
nnet::load_weights_from_txt<model_default_t, 64>(b2, "b2.txt");
nnet::load_weights_from_txt<model_default_t, 73728>(vgg16w3, "vgg16w3.txt");
nnet::load_weights_from_txt<model_default_t, 128>(b3, "b3.txt");
nnet::load_weights_from_txt<model_default_t, 147584>(vgg16w4, "vgg16w4.txt");
nnet::load_weights_from_txt<model_default_t, 128>(b4, "b4.txt");
nnet::load_weights_from_txt<model_default_t, 294912>(vgg16w5, "vgg16w5.txt");
nnet::load_weights_from_txt<model_default_t, 256>(b5, "b5.txt");
nnet::load_weights_from_txt<model_default_t, 589824>(vgg16w6, "vgg16w6.txt");
nnet::load_weights_from_txt<model_default_t, 256>(b6, "b6.txt");
nnet::load_weights_from_txt<model_default_t, 589824>(vgg16w7, "vgg16w7.txt");
nnet::load_weights_from_txt<model_default_t, 256>(b7, "b7.txt");
nnet::load_weights_from_txt<model_default_t, 1179648>(vgg16w8, "vgg16w8.txt");
nnet::load_weights_from_txt<model_default_t, 512>(b8, "b8.txt");
nnet::load_weights_from_txt<model_default_t, 2359296>(vgg16w9, "vgg16w9.txt");
nnet::load_weights_from_txt<model_default_t, 512>(b9, "b9.txt");
nnet::load_weights_from_txt<model_default_t, 2359296>(vgg16w10, "vgg16w10.txt");
nnet::load_weights_from_txt<model_default_t, 512>(b10, "b10.txt");
nnet::load_weights_from_txt<model_default_t, 2359296>(vgg16w11, "vgg16w11.txt");
nnet::load_weights_from_txt<model_default_t, 512>(b11, "b11.txt");
// nnet::load_weights_from_txt<model_default_t, 2359296>(vgg16w12, "vgg16w12.txt");
nnet::load_weights_from_txt<model_default_t, 512>(b12, "b12.txt");
// nnet::load_weights_from_txt<model_default_t, 2359296>(vgg16w13, "vgg16w13.txt");
nnet::load_weights_from_txt<model_default_t, 512>(b13, "b13.txt");
loaded_weights = true;
}
#endif
// ****************************************
// NETWORK INSTANTIATION
// ****************************************
//hls-fpga-machine-learning insert layers
//conv1
hls::stream<layer2_t> layer1_out("layer1_out");
#pragma HLS STREAM variable=layer1_out depth=64
nnet::conv_2d_cl<input_t, layer2_t, config1>(input1, layer1_out, vgg16w1, b1); // conv2d_1
//conv2
hls::stream<layer2_t> layer2_out("layer2_out");
#pragma HLS STREAM variable=layer2_out depth=64
nnet::conv_2d_cl<layer2_t, layer2_t, config2>(layer1_out, layer2_out, vgg16w2, b2); // conv2d_1
hls::stream<layer2_t> maxlayer2_out("maxlayer2_out");
#pragma HLS STREAM variable=maxlayer2_out depth=64
nnet::pooling2d_cl<layer2_t, layer2_t, maxconfig2>(layer2_out, maxlayer2_out); // max_pooling2d_1
//conv3
hls::stream<layer2_t> layer3_out("layer3_out");
#pragma HLS STREAM variable=layer3_out depth=128
nnet::conv_2d_cl<layer2_t, layer2_t, config3>(maxlayer2_out, layer3_out, vgg16w3, b3); // conv2d_1
//conv4
hls::stream<layer2_t> layer4_out("layer4_out");
#pragma HLS STREAM variable=layer4_out depth=128
nnet::conv_2d_cl<layer2_t, layer2_t, config4>(layer3_out, layer4_out, vgg16w4, b4); // conv2d_1
hls::stream<layer2_t> maxlayer4_out("maxlayer4_out");
#pragma HLS STREAM variable=maxlayer4_out depth=128
nnet::pooling2d_cl<layer2_t, layer2_t, maxconfig4>(layer4_out, maxlayer4_out); // max_pooling2d_1
//conv5
hls::stream<layer2_t> layer5_out("layer5_out");
#pragma HLS STREAM variable=layer5_out depth=256
nnet::conv_2d_cl<layer2_t, layer2_t, config5>(maxlayer4_out, layer5_out, vgg16w5, b5); // conv2d_1
//conv6
hls::stream<layer2_t> layer6_out("layer6_out");
#pragma HLS STREAM variable=layer6_out depth=256
nnet::conv_2d_cl<layer2_t, layer2_t, config6>(layer5_out, layer6_out, vgg16w6, b6); // conv2d_1
//conv7
hls::stream<layer2_t> layer7_out("layer7_out");
#pragma HLS STREAM variable=layer7_out depth=256
nnet::conv_2d_cl<layer2_t, layer2_t, config7>(layer6_out, layer7_out, vgg16w7, b7); // conv2d_1
hls::stream<layer2_t> maxlayer7_out("maxlayer7_out");
#pragma HLS STREAM variable=maxlayer7_out depth=256
nnet::pooling2d_cl<layer2_t, layer2_t, maxconfig7>(layer7_out, maxlayer7_out); // max_pooling2d_1
//conv8
hls::stream<layer2_t> layer8_out("layer8_out");
#pragma HLS STREAM variable=layer8_out depth=256
nnet::conv_2d_cl<layer2_t, layer2_t, config8>(maxlayer7_out, layer8_out, vgg16w8, b8); // conv2d_1
//conv9
hls::stream<layer2_t> layer9_out("layer9_out");
#pragma HLS STREAM variable=layer9_out depth=256
nnet::conv_2d_cl<layer2_t, layer2_t, config9>(layer8_out, layer9_out, vgg16w9, b9); // conv2d_1
//conv10
hls::stream<layer2_t> layer10_out("layer10_out");
#pragma HLS STREAM variable=layer10_out depth=256
nnet::conv_2d_cl<layer2_t, layer2_t, config10>(layer9_out, layer10_out, vgg16w10, b10); // conv2d_1
hls::stream<layer2_t> maxlayer10_out("maxlayer10_out");
#pragma HLS STREAM variable=maxlayer10_out depth=256
nnet::pooling2d_cl<layer2_t, layer2_t, maxconfig10>(layer10_out, maxlayer10_out); // max_pooling2d_1
//conv11
hls::stream<layer2_t> layer11_out("layer11_out");
#pragma HLS STREAM variable=layer11_out depth=512
nnet::conv_2d_cl<layer2_t, layer2_t, config11>(maxlayer10_out, layer11_out, vgg16w11, b11); // conv2d_1
//conv12
hls::stream<layer2_t> layer12_out("layer12_out");
#pragma HLS STREAM variable=layer12_out depth=512
nnet::conv_2d_cl<layer2_t, layer2_t, config12>(layer11_out, layer12_out, vgg16w11, b12); // conv2d_1
//conv13
// hls::stream<layer2_t> layer13_out("layer13_out");
// #pragma HLS STREAM variable=layer13_out depth=512
nnet::conv_2d_cl<layer2_t, layer2_t, config13>(layer12_out, maxlayer13_out, vgg16w11, b13); // conv2d_1
// nnet::pooling2d_cl<layer2_t, layer2_t, maxconfig13>(layer13_out, maxlayer13_out); // max_pooling2d_1
}
Hi @lloo099, I was trying to synthesize VGG16 lately but I keep getting "partitioned elements number has exeeded the threshold" error despite using a very large reuse factor and Resource strategy. Could you please share the hls4ml config you used to synthesize VGG16? Thanks!