Sagemaker client issue
when i am executing token_benchmark_ray.py we are getting below error
File "token_benchmark_ray.py", line 456, in
Hey @SuchethaChintha did you fix that ?
This is probably due to using an older version of the Sagemaker SDK. Updating it should fix the issue.
It seems that this error occurs because there's an inconsistency in how INTER_TOKEN_LAT is handled between different LLM clients.
SageMaker client keeps INTER_TOKEN_LAT as a list https://github.com/ray-project/llmperf/blob/f1d6bed47e4501b0e371082b41601b59ab55269f/src/llmperf/ray_clients/sagemaker_client.py#L109
On the other hand, OpenAI client sums the latencies before returning https://github.com/ray-project/llmperf/blob/f1d6bed47e4501b0e371082b41601b59ab55269f/src/llmperf/ray_clients/openai_chat_completions_client.py#L112
I think that if you modify the source code for sagemaker_client.py as follows, it will work correctly.
metrics[common_metrics.INTER_TOKEN_LAT] = sum(time_to_next_token)
Even if you make this change, INTER_TOKEN_LAT is divided by the number of output tokens in token_benchmark_ray.py, so the correct metrics should be calculated.