eycheung
eycheung
## Summary Add weight-only quantization for T5. I've added this to the path loading from binary weights. I do not think the HF weight loading currently works, so I have...
### Branch/Tag/Commit main ### Docker Image Version none ### GPU name T4 ### CUDA Driver 525.60.13 ### Reproduced Steps ```shell ## Steps 1. Download public GPT-NeoX Model https://huggingface.co/EleutherAI/pythia-70m 2. Convert...
Support the `gather_all_token_logits` flag for building Llama models. This is needed to support returning `context_logits` Relevant issue: https://github.com/NVIDIA/TensorRT-LLM/issues/122