eycheung

Results 3 issues of eycheung

## Summary Add weight-only quantization for T5. I've added this to the path loading from binary weights. I do not think the HF weight loading currently works, so I have...

### Branch/Tag/Commit main ### Docker Image Version none ### GPU name T4 ### CUDA Driver 525.60.13 ### Reproduced Steps ```shell ## Steps 1. Download public GPT-NeoX Model https://huggingface.co/EleutherAI/pythia-70m 2. Convert...

bug

Support the `gather_all_token_logits` flag for building Llama models. This is needed to support returning `context_logits` Relevant issue: https://github.com/NVIDIA/TensorRT-LLM/issues/122

triaged
Community want to contribute
Generic Runtime