xFasterTransformer
xFasterTransformer copied to clipboard
add bf16_int8 support for invokeLayerLLaMA API
invokeLayerLLaMA API enhancement:
- Add bf16_int8 dtype support
- Add kvcache dtype argument
- Add Rope type argument