Simeng Liu comments

Results 5 comments of


                                            Simeng Liu

[https://nvbugs/5625743][test] Adding test_batch_completions_beam_sea…

/bot run

[https://nvbugs/5625743][test] Adding test_batch_completions_beam_sea…

Close this pr as the NIM release will be based of release/1.1. Moving to https://github.com/NVIDIA/TensorRT-LLM/pull/9471.

Executor API: How to get throughput

Hi @khayamgondal , the end-to-end throughput statistics are calculated not directly reported. For example, `Token Throughput (tokens/sec) = total_output_tokens / total_latency`. `Request Throughput (req/sec) = total_num_requests / total_latency.` For the...

Executor API: How to get throughput

@khayamgondal You can try adding the `host_cache_size` option in https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/bench/dataclasses/configuration.py#L215-L220.

Executor API: How to get throughput

@khayamgondal You can think of on-GPU kv_cache memory as serving two main purposes: 1. Per-iteration allocation: At the start of each iteration, enough GPU memory must be available to hold...