FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

Transformer related optimization, including BERT, GPT

Results 184 FasterTransformer issues
Sort by recently updated
recently updated
newest added

### Description ```shell In file included from /data/mt/hbl/FasterTransformer_v5_tf/src/fastertransformer/tf_op/bert/BertOp.cc:18: /data/mt/hbl/FasterTransformer_v5_tf/src/fastertransformer/tf_op/BaseOp.h: In member function ‘fastertransformer::Tensor BaseOp::convert_tensor(tensorflow::Tensor)’: /data/mt/hbl/FasterTransformer_v5_tf/src/fastertransformer/tf_op/BaseOp.h:105:36: error: ‘bfloat16’ is not a member of ‘Eigen’ 105 | if (std::is_same::value == true) {...

bug

**System and software** fastertransformer version: v4.0 GPU: T4 CUDA: 11.0 PyTorch: 1.8 **Issue description** I have a fp16 BERT model with 12 fastertransformer encoders. When I do inference with the...

I see that there is full int8 support (both weights and activations) for BERT, its not clear to me what is supported for GPT models ([here](https://github.com/NVIDIA/FasterTransformer/blob/main/examples/pytorch/gpt/utils/parallel_gpt.py#L28)). Ideally if we can...

When calculating the log likelihood of token at position i, we should consider the logits at step i-1 and also log likelihood of starting token is undefined (can be set...

### Description system and software: - fastertransformer version: v5.0 - GPU: T4 - Swin-Transformer: e0486b2cf8c63b6314570a43007569c8aa9b4578 - CUDA: 11.0 ### Error Message 1. got `nan` of fp16 inference of swintransformer_op: `FP16_torch_traced_output...

bug

### Description ```shell Does it support python api? In a similar way to trtexec, transfer the engine model and do inference? ``` ### Reproduced Steps ```shell Does it support python...

bug

Hi, If I take the same encoder input and pad it to a different maximum length, then I get noticeably different encoder memory key/value tensors from decoder cross attention. And...

Thanks for your greate works of int8 quantization for ViT, I have some problems about the quantization of ViT' SelfAttention As in transformer Attention: 1) attn_score = Q * K^T...

Hi, we want to be able to pass in different values of topk/topp for each element in the batch at runtime for sampling, and it would be great if we...

### Description Tesla K80. Cuda 11.3. CudNN 8.2. ```shell root@abcbe3e329ca:/workspace/FasterTransformer/build# mpirun -n 2 --allow-run-as-root ./bin/gptj_example Total ranks: 2. Device NVIDIA Tesla K80 P1 is runing with 1 GPU. [INFO] Setting...

bug