Kuldeepsinh Jadeja

Results 6 comments of Kuldeepsinh Jadeja

Hi @scopsy, I would also like to work on this issue, if possible.

> Could you provide a test script for the speedup? +1

Can we have this support for local serving using AsyncLLMEngine. My scenario/use-case is as follows: I am using Mixtral 8x7b on g5.12xlarge ec2 instance and serving it locally using Python....

Do we have plans to support https://github.com/vllm-project/vllm/issues/5540? We are having a production level use case and would really appreciate if someone can look into it for Q4 onwards.

> > Do we have plans to support #5540? We are having a production level use case and would really appreciate if someone can look into it for Q4 onwards....

> > > > Do we have plans to support #5540? We are having a production level use case and would really appreciate if someone can look into it for...