TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

How to get output including context_logits with GPU tensors?

Open lkm2835 opened this issue 1 year ago • 1 comments

llm = LLM('/app/models/tensorrt_llm', skip_tokenizer_init=True)

sampling_params = SamplingParams(end_id=2, return_context_logits=True, max_new_tokens=1)

results  = llm.generate([[32, 12,24,54,6,747]], sampling_params=sampling_params)

print(results)
print(results[0].context_logits)


GenerationResult(request_id=1, prompt_token_ids=[32, 12, 24, 54, 6, 747], outputs=[CompletionOutput(index=0, text='', token_ids=[], cumulative_logprob=None, logprobs=[])], finished=False)

tensor([[ -4.7734,  -6.8086,  -2.9629,  ...,  -4.6484,  -5.6211,  -5.0430],
        [  5.9062,   5.9453,   1.4648,  ...,   9.1797,   7.4297,   7.4883],
        [ 10.3906,  13.6094,   9.4766,  ...,  13.9062,  11.4062,  12.7891],
        [  4.1172,   1.7715,  -7.2344,  ...,   2.8203,   5.0391,   2.3750],
        [  1.6025,  -3.4180, -10.7422,  ...,  -2.5332,  -0.2891,  -1.4541],
        [  3.5684,   2.9492,  -0.8101,  ...,   4.0977,   4.3750,   2.9492]])

Context logit tensors are so huge but llm.generate is very slow because it return the tensors down from gpu to cpu.

How to get output including context_logits with GPU tensors?

lkm2835 avatar Aug 29 '24 08:08 lkm2835

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

github-actions[bot] avatar Sep 29 '24 02:09 github-actions[bot]

This issue was closed because it has been stalled for 15 days with no activity.

github-actions[bot] avatar Oct 15 '24 02:10 github-actions[bot]