Support the gather_all_token_logits flag for Llama

Open eycheung opened this issue 2 years ago • 0 comments

Support the gather_all_token_logits flag for building Llama models. This is needed to support returning context_logits

Relevant issue: https://github.com/NVIDIA/TensorRT-LLM/issues/122

Nov 03 '23 18:11 eycheung