How can I speed up using llmlingua2 ?

Open yyjabiding opened this issue 1 year ago • 1 comments

Describe the issue

I have a context length of about 100k. Is there any methods that I can do to speed up using llmlingua2 to compress it and keep it within s short time like shorter than 2 seconds?" Thanks.

Aug 06 '24 03:08 yyjabiding

Hi @yyjabiding, thanks for your interest in LLMLingua.

Although we haven't tested it, it seems possible. LLMLingua-2 forwards a BERT-level model chunk by chunk, so increasing the batch size could potentially reduce latency. You can check the implementation here: LLMLingua Prompt Compressor.

Aug 22 '24 04:08 iofu728