LLMLingua
LLMLingua copied to clipboard
How can I speed up using llmlingua2 ?
Describe the issue
I have a context length of about 100k. Is there any methods that I can do to speed up using llmlingua2 to compress it and keep it within s short time like shorter than 2 seconds?" Thanks.
Hi @yyjabiding, thanks for your interest in LLMLingua.
Although we haven't tested it, it seems possible. LLMLingua-2 forwards a BERT-level model chunk by chunk, so increasing the batch size could potentially reduce latency. You can check the implementation here: LLMLingua Prompt Compressor.