eycheung issues

Repositories
Issues
Comments

Results 3 issues of


                                            eycheung

Add weight-only quantization for T5 models

## Summary Add weight-only quantization for T5. I've added this to the path loading from binary weights. I do not think the HF weight loading currently works, so I have...

GPT-NeoX gives poor results using FP16

### Branch/Tag/Commit main ### Docker Image Version none ### GPU name T4 ### CUDA Driver 525.60.13 ### Reproduced Steps ```shell ## Steps 1. Download public GPT-NeoX Model https://huggingface.co/EleutherAI/pythia-70m 2. Convert...

bug

Support the gather_all_token_logits flag for Llama

Support the `gather_all_token_logits` flag for building Llama models. This is needed to support returning `context_logits` Relevant issue: https://github.com/NVIDIA/TensorRT-LLM/issues/122

triaged

Community want to contribute

Generic Runtime