The support on vLLM?

Open KexinFeng opened this issue 1 year ago • 2 comments

Hi,

I remember the support on vLLM was on your TODOs. Have you achieved it now? Was the main challenge in this direction that the batch size > 1 tree verification is hard to made efficient? Thanks!

Apr 17 '24 22:04 KexinFeng

Currently we have not added support for vLLM and are working to build a tensor parallelism system. With batch size > 1, we need to solve some additional problems, such as the #accepted tokens can be different for each request in the same batch. And the communication time is not considered in current implementation. After we build the tensor parallelism system, we will make it compatible with vLLM or other inference engines. Thank you！

Apr 21 '24 01:04 dreaming-panda

Have you achieved tensor parallelism now?

Oct 13 '24 03:10 lethean1