Mehdi Bahri comments

Results 6 comments of


                                            Mehdi Bahri

Fully-working example with dynamic batching

Hi, thanks for your reply. I think I might be a bit confused here, but could you explain why I don't need to use dynamic batching? The way I thought...

Fully-working example with dynamic batching

thanks @dtrawins , so to confirm, with parallel model execution and setting NUM_STREAMS, I would just use a batch size of 1 for each model instance?

Hope for release codes

Hi, Thank you for the reminder, I'll try to clean and share the code soon. Best,

Confusion about versions and NGC images

Having the tensorrt_llm package installed would also make it much easier to deploy TensorRT-LLM models with the Python backend for Triton (needed for multimodal and encoder-decoder for now)

[Transformers Optimizer] CLIP-ViT encoder attention not getting fused

> Can you specify transformers version as well? They made a lot of changes recently. Hi, ``` In [6]: import transformers In [7]: transformers.__version__ Out[7]: '4.41.2' ``` I also tried...

[Transformers Optimizer] CLIP-ViT encoder attention not getting fused

Sweet! I assumed that PR was already part of the stable release. It's looking much better with `ort-nightly`, thank you ``` Using PyTorch 2.3.1, Transformers 4.41.2, and ORT 1.19.0 Optimizing...