FasterTransformer issues

v5 on nvcr.io/nvidia/tensorflow:20.12-tf1-py3 build bert for tensorflow error

### Description ```shell In file included from /data/mt/hbl/FasterTransformer_v5_tf/src/fastertransformer/tf_op/bert/BertOp.cc:18: /data/mt/hbl/FasterTransformer_v5_tf/src/fastertransformer/tf_op/BaseOp.h: In member function ‘fastertransformer::Tensor BaseOp::convert_tensor(tensorflow::Tensor)’: /data/mt/hbl/FasterTransformer_v5_tf/src/fastertransformer/tf_op/BaseOp.h:105:36: error: ‘bfloat16’ is not a member of ‘Eigen’ 105 | if (std::is_same::value == true) {...

PAOPAO6

bug

The fp16 inference of pytorch encoder with different batchsize got different output

1

**System and software** fastertransformer version: v4.0 GPU: T4 CUDA: 11.0 PyTorch: 1.8 **Issue description** I have a fp16 BERT model with 12 fastertransformer encoders. When I do inference with the...

woskii

INT8 Support for GPT models

11

I see that there is full int8 support (both weights and activations) for BERT, its not clear to me what is supported for GPT models ([here](https://github.com/NVIDIA/FasterTransformer/blob/main/examples/pytorch/gpt/utils/parallel_gpt.py#L28)). Ideally if we can...

bharatv007

Possible Bug in Context Likelihood

3

When calculating the log likelihood of token at position i, we should consider the logits at step i-1 and also log likelihood of starting token is undefined (can be set...

bharatv007

The fp16 inference of pytorch swintransformer op got `nan` output.

6

### Description system and software: - fastertransformer version: v5.0 - GPU: T4 - Swin-Transformer: e0486b2cf8c63b6314570a43007569c8aa9b4578 - CUDA: 11.0 ### Error Message 1. got `nan` of fp16 inference of swintransformer_op: `FP16_torch_traced_output...

MenglingD

bug

Does it support python api? In a similar way to trtexec, transfer the engine model and do inference？

1

### Description ```shell Does it support python api? In a similar way to trtexec, transfer the engine model and do inference？ ``` ### Reproduced Steps ```shell Does it support python...

tensorflowt

bug

Possible bug in decoder cross attention

1

Hi, If I take the same encoder input and pad it to a different maximum length, then I get noticeably different encoder memory key/value tensors from decoder cross attention. And...

no42name42

How to quantize attn_score = Q*K and in ViT's SelfAttention

3

Thanks for your greate works of int8 quantization for ViT, I have some problems about the quantization of ViT' SelfAttention As in transformer Attention: 1) attn_score = Q * K^T...

shoveller86

sampling using a single top_k_top_p kernel for all cases

5

Hi, we want to be able to pass in different values of topk/topp for each element in the batch at runtime for sampling, and it would be great if we...

bilal2vec

Segmentation fault: address not mapped to object at address 0x1eb46b2d3

13

### Description Tesla K80. Cuda 11.3. CudNN 8.2. ```shell root@abcbe3e329ca:/workspace/FasterTransformer/build# mpirun -n 2 --allow-run-as-root ./bin/gptj_example Total ranks: 2. Device NVIDIA Tesla K80 P1 is runing with 1 GPU. [INFO] Setting...

0x7o

bug

FasterTransformer
FasterTransformer copied to clipboard

Metadata

v5 on nvcr.io/nvidia/tensorflow:20.12-tf1-py3 build bert for tensorflow error

The fp16 inference of pytorch encoder with different batchsize got different output

INT8 Support for GPT models

Possible Bug in Context Likelihood

The fp16 inference of pytorch swintransformer op got `nan` output.

Does it support python api? In a similar way to trtexec, transfer the engine model and do inference？

Possible bug in decoder cross attention

How to quantize attn_score = Q*K and in ViT's SelfAttention

sampling using a single top_k_top_p kernel for all cases

Segmentation fault: address not mapped to object at address 0x1eb46b2d3

← Metadata

Owner

Metadata

FasterTransformer FasterTransformer copied to clipboard

Metadata

← Metadata

Owner

Metadata

FasterTransformer
FasterTransformer copied to clipboard