Add support for head_dim > 1024 for fp16, no whitespace change
Thanks to @842974287 's implementation. Add head_dim > 1024 for fp16 in add_QKV_bias_rebuild_padding add_bias_input_layernorm
For the comments in https://github.com/842974287/FasterTransformer/commit/dacb3ceed52d6cdb59f10adc6fa02f615da9084a
- When word_per_block != 1, dim3 grid(m * half_k / block.x / word_per_block * 3); could generate remainder, which might cause problem.
- This diff now contains the implementation of add_bias_input_layernorm when headdim > 1024.
Please let me know if this pr is good for commit, or we need to modify. Thanks!
cc @byshiue
I cannot compile the codes successfully. Even if I fix the issue, I will get wrong results when I run the hidden_dim > 1024. How do you verify the correctness?
I cannot compile the codes successfully. Even if I fix the issue, I will get wrong results when I run the hidden_dim > 1024. How do you verify the correctness?
Thanks for you reply!
We have some unit test to test its correctness internally, but I havent test this part of code in open source environment. I wonder what is your suggestion of testing it in open source?
Also, we will work on fixing https://github.com/NVIDIA/FasterTransformer/pull/104 and merge it as well recently :)
Here is a simple unit test. You can add some cases with hidden_dimension > 1024 into the unit test.
The request of #104 are supported in next beta version.
next beta version
Great! Thanks! I wonder when will this beta version becoming a steady official releasing version, or it is already steady enough to be imported as thirdparty library?
For your request and the BERT model, it should be steady. We release it as beta version because:
- We may break the API again recently.
- We still not update all guides. But the guide of BERT should be latest.