Sequoia icon indicating copy to clipboard operation
Sequoia copied to clipboard

The support on vLLM?

Open KexinFeng opened this issue 1 year ago • 2 comments

Hi,

I remember the support on vLLM was on your TODOs. Have you achieved it now? Was the main challenge in this direction that the batch size > 1 tree verification is hard to made efficient? Thanks!

KexinFeng avatar Apr 17 '24 22:04 KexinFeng

Currently we have not added support for vLLM and are working to build a tensor parallelism system. With batch size > 1, we need to solve some additional problems, such as the #accepted tokens can be different for each request in the same batch. And the communication time is not considered in current implementation. After we build the tensor parallelism system, we will make it compatible with vLLM or other inference engines. Thank you!

dreaming-panda avatar Apr 21 '24 01:04 dreaming-panda

Have you achieved tensor parallelism now?

lethean1 avatar Oct 13 '24 03:10 lethean1