LLMLingua icon indicating copy to clipboard operation
LLMLingua copied to clipboard

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Results 117 LLMLingua issues
Sort by recently updated
recently updated
newest added

Hello!, Getting the "Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx" error when trying to execute the...

question

code like this: prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"]) kc = RetrievalQA.from_llm(llm=qwllm, retriever=compression_retriever, prompt=prompt)

question

Hello! I put LLM Lingua into Autogen as part of a compressible agent https://github.com/microsoft/autogen/pull/1005 Basically functional but too slow on my mac book with llama2 to really test. I figured...

question

model name: Llama2-Chinese-7b-Chat instruction = "请你对以下文本进行摘要" question = "" input:依达拉奉右莰醇是一种新型的神经保 护剂,包括依达拉奉和右莰醇,一种在动物 缺血性卒中模型中具有抗炎作用的食品添加 剂。这项研究旨在评估与依达拉奉相比,依 达拉奉右莰醇醇在治疗急性缺血性卒中 (AIS)患者中的安全性和有效性。\n方法 在这项多中心,随机,双盲,多剂量, 主动对照的II期临床试验中,卒中发作后48小 时内AIS患者被随机分配(1:1:1:1)低剂 量( 12.5毫克),中剂量(37.5毫克)或高剂 量(62.5毫克)依达拉奉右莰醇组,以及一个 活动对照组,每12小时静脉输注30毫克依达 拉奉,连续14天。主要疗效结果是改良的 Rankin量表(mRS)分数在90天时≤1的比例 以及美国国立卫生研究院卒中量表(NIHSS) 评分从基线到随机分组后14天的变化。安全 结果包括治疗后90天内的任何不良事件。\n结果 纳入疗效分析的385例患者中,随机分为 低剂量组94例,中剂量组97例,高剂量组98 例,对照组96例。在90天的mRS评分...

bug

Very nice work! I am trying to replicate the results of longllmlingua on a Natural Questions dataset, but there may be some discrepancies between the results and those in the...

question

I installed the version 0.27.4 for runing the code ```examples/CoT.ipynb``` some error raised when running the following line ``` request_data = { "prompt": prompt, "max_tokens": 400, "temperature": 0, "top_p": 1,...

question

Will it still able to summary/asked by some important events in book?

question

I noticed there is an unnecessary duplicate declaration of `loss_fct` [here](https://github.com/microsoft/LLMLingua/blob/bf6723c3eca3569d23c4ec367c588660dc2e65e7/llmlingua/prompt_compressor.py#L113-L120). **Relevant code:** ```python loss_fct = torch.nn.CrossEntropyLoss(reduction="none") shift_logits = response.logits[..., :-1, :].contiguous() shift_labels = input_ids[..., past_length + 1 : end].contiguous()...

сhanged concatenation of strings to f-strings to improve readability and unify with the rest of code

style