Shannon Phu

Results 6 comments of Shannon Phu

Did anyone figure this out on how to use other architectures?

@symphonylyh (1) and/or (3). I am not super clear on the difference between the Python vs C++ backend. I was using this to build the engine https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/enc_dec/README.md

@mlmonk Oh interesting, I was under the impression that we just couldn't serve T5 models on Triton yet because the TRT-LLM backend wasn't ready for it yet.

this helped fix the build for me

I'm running into this issue too. What are the build and make commands you are running?

@champson were you able to get FasterTransformer working on an L4 GPU?