TensorRT-LLM
TensorRT-LLM copied to clipboard
Support the gather_all_token_logits flag for Llama
Support the gather_all_token_logits flag for building Llama models. This is needed to support returning context_logits
Relevant issue: https://github.com/NVIDIA/TensorRT-LLM/issues/122