LLMLingua
LLMLingua copied to clipboard
To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Hello!, Getting the "Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx" error when trying to execute the...
code like this: prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"]) kc = RetrievalQA.from_llm(llm=qwllm, retriever=compression_retriever, prompt=prompt)
Hello! I put LLM Lingua into Autogen as part of a compressible agent https://github.com/microsoft/autogen/pull/1005 Basically functional but too slow on my mac book with llama2 to really test. I figured...
model name: Llama2-Chinese-7b-Chat instruction = "请你对以下文本进行摘要" question = "" input:依达拉奉右莰醇是一种新型的神经保 护剂,包括依达拉奉和右莰醇,一种在动物 缺血性卒中模型中具有抗炎作用的食品添加 剂。这项研究旨在评估与依达拉奉相比,依 达拉奉右莰醇醇在治疗急性缺血性卒中 (AIS)患者中的安全性和有效性。\n方法 在这项多中心,随机,双盲,多剂量, 主动对照的II期临床试验中,卒中发作后48小 时内AIS患者被随机分配(1:1:1:1)低剂 量( 12.5毫克),中剂量(37.5毫克)或高剂 量(62.5毫克)依达拉奉右莰醇组,以及一个 活动对照组,每12小时静脉输注30毫克依达 拉奉,连续14天。主要疗效结果是改良的 Rankin量表(mRS)分数在90天时≤1的比例 以及美国国立卫生研究院卒中量表(NIHSS) 评分从基线到随机分组后14天的变化。安全 结果包括治疗后90天内的任何不良事件。\n结果 纳入疗效分析的385例患者中,随机分为 低剂量组94例,中剂量组97例,高剂量组98 例,对照组96例。在90天的mRS评分...
Very nice work! I am trying to replicate the results of longllmlingua on a Natural Questions dataset, but there may be some discrepancies between the results and those in the...
I installed the version 0.27.4 for runing the code ```examples/CoT.ipynb``` some error raised when running the following line ``` request_data = { "prompt": prompt, "max_tokens": 400, "temperature": 0, "top_p": 1,...
Will it still able to summary/asked by some important events in book?
I noticed there is an unnecessary duplicate declaration of `loss_fct` [here](https://github.com/microsoft/LLMLingua/blob/bf6723c3eca3569d23c4ec367c588660dc2e65e7/llmlingua/prompt_compressor.py#L113-L120). **Relevant code:** ```python loss_fct = torch.nn.CrossEntropyLoss(reduction="none") shift_logits = response.logits[..., :-1, :].contiguous() shift_labels = input_ids[..., past_length + 1 : end].contiguous()...
сhanged concatenation of strings to f-strings to improve readability and unify with the rest of code