Le Tuan Thanh
Le Tuan Thanh
In my opinion, Deberta MLM is used with EMD but transformers pipeline not use EMD code for MASK token prediction. So you cannot use the result produced by transformers fill-mask...
If you use huggingface basic MLM pipline to continue pretrain on deberta-v3-large, the pretrained encoder weight is used to learn along with the prediction head layer weight which is newly...
> > * [Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) [[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model...
I faced the same issue while running with TensorRT-LLM Triton Backend latest version, the output is not consistent and not the same as HF output. When I run load test...
@symphonylyh Yes! As I remember it worked normally at TRT-LLM version `0.11.0.dev2024061800`.
@symphonylyh Do you find out the issue here?
@symphonylyh Let me know if there are something new insighted. I am very eager to apply TensorRT-LLM for enc-dec in production.
@ogaloglu I know, but I am eager to apply the inflight batching feature, not the dynamic batching feature. If use the above parameters, it is total like FasterTransformer with dynamic...
@symphonylyh Do you have any progress here?
@symphonylyh Expect your fix next week!