xFasterTransformer icon indicating copy to clipboard operation
xFasterTransformer copied to clipboard

[Kernel] Add FP16 MHA and MLP kernels.

Open changqi1 opened this issue 1 year ago • 0 comments

# weight only FP16 (input FP32, weight FP16, output FP32)
[INFO] First token time: 148.062 ms
[INFO] Second token time: 48.3581 ms
[INFO] Final output is:
==============================================
Once upon a time, there existed a little girl who liked to have adventures. She lived in a small village surrounded by

# Full-link FP16  (input FP16, weight FP16, output FP16)
[INFO] First token time: 144.831 ms
[INFO] Second token time: 46.0737 ms
[INFO] Final output is:
==============================================
Once upon a time, there existed a little girl who liked to have adventures. She lived in a small village surrounded by

changqi1 avatar May 21 '24 08:05 changqi1