xFasterTransformer icon indicating copy to clipboard operation
xFasterTransformer copied to clipboard

add bf16_int8 support for invokeLayerLLaMA API

Open miaojinc opened this issue 1 year ago • 0 comments

invokeLayerLLaMA API enhancement:

  1. Add bf16_int8 dtype support
  2. Add kvcache dtype argument
  3. Add Rope type argument

miaojinc avatar Jul 22 '24 03:07 miaojinc