Le Tuan Thanh

Results 20 comments of Le Tuan Thanh

In my opinion, Deberta MLM is used with EMD but transformers pipeline not use EMD code for MASK token prediction. So you cannot use the result produced by transformers fill-mask...

If you use huggingface basic MLM pipline to continue pretrain on deberta-v3-large, the pretrained encoder weight is used to learn along with the prediction head layer weight which is newly...

> > * [Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) [[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model...

I faced the same issue while running with TensorRT-LLM Triton Backend latest version, the output is not consistent and not the same as HF output. When I run load test...

@symphonylyh Yes! As I remember it worked normally at TRT-LLM version `0.11.0.dev2024061800`.

@symphonylyh Let me know if there are something new insighted. I am very eager to apply TensorRT-LLM for enc-dec in production.

@ogaloglu I know, but I am eager to apply the inflight batching feature, not the dynamic batching feature. If use the above parameters, it is total like FasterTransformer with dynamic...