FasterTransformer [Enhancement] Allow Bert Encoder to specify hidden dime for the fc layers

This was the second enhancement in https://github.com/NVIDIA/FasterTransformer/issues/98.

Add support for different hidden dimension size in bert's fc layers. Currently in bert encoder feed forward part, the hidden dim of the first fc is hardcoded to 4 * head_dim * head_size. This PR added a field to pass in the size of hidden dimension.

Jul 07 '21 05:07 842974287

@byshiue Hey, I added some fixes, could you please take a look again? Thanks!

Jul 07 '21 05:07 842974287

Have you compiled the codes to verify the correctness? We find that the pull request of last comment cannot be compiled successfully.

Jul 07 '21 06:07 byshiue

Really sorry about the inconvenience. I can't really run the tensorflow unit tests due to some constraints. I tried running encoder_sample.cc but hit an error at this line saying

<jemalloc>: size mismatch detected (true size 32768 vs input size 8), likely caused by application sized dealloction bugs (source address: 0x7ffab28eb000, the current pointer being freed). Suggest building with --enable-debug or address sanitizer for debugging. Abort.

Jul 07 '21 07:07 842974287

I think that original code can work normally.

Jul 07 '21 07:07 byshiue

Besides, I still cannot compile this code successfully on TensorFlow.

Jul 07 '21 07:07 byshiue

@842974287 I think we should try to compile/screen tf code to make sure it works for tf. For example, to avoid fixes like https://github.com/NVIDIA/FasterTransformer/commit/55c6c6955e1975b8866a3b6f74c1f847d3d9ee9a

Jul 08 '21 06:07 yinghai