Question about performance

Open SwibonX opened this issue 10 months ago • 1 comments

Could you please provide the relevant code for performance testing? Because during my testing process, the performance seems to be worse than that of vLLM.

Mar 25 '25 09:03 SwibonX

I disabled CUDA Graph (--enforce-eager) and multi-step scheduling when benchmarking vLLM. Could you please test this configuration?

Mar 31 '25 03:03 interestingLSY