Kuldeepsinh Jadeja
Kuldeepsinh Jadeja
Hi @scopsy, I would also like to work on this issue, if possible.
> Could you provide a test script for the speedup? +1
Can we have this support for local serving using AsyncLLMEngine. My scenario/use-case is as follows: I am using Mixtral 8x7b on g5.12xlarge ec2 instance and serving it locally using Python....
Do we have plans to support https://github.com/vllm-project/vllm/issues/5540? We are having a production level use case and would really appreciate if someone can look into it for Q4 onwards.
> > Do we have plans to support #5540? We are having a production level use case and would really appreciate if someone can look into it for Q4 onwards....
> > > > Do we have plans to support #5540? We are having a production level use case and would really appreciate if someone can look into it for...