Shannon Phu
Shannon Phu
Did anyone figure this out on how to use other architectures?
@symphonylyh (1) and/or (3). I am not super clear on the difference between the Python vs C++ backend. I was using this to build the engine https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/enc_dec/README.md
@mlmonk Oh interesting, I was under the impression that we just couldn't serve T5 models on Triton yet because the TRT-LLM backend wasn't ready for it yet.
this helped fix the build for me
I'm running into this issue too. What are the build and make commands you are running?
@champson were you able to get FasterTransformer working on an L4 GPU?