Zhenyu (Allen) Zhang

Results 12 comments of Zhenyu (Allen) Zhang

Hi @hasanibnarif, Huggingface update their cache implementation since version 3.36. Previously the past_key_value are a list of tensors that contain key and value embeddings while now they use a cache...

Hi, that might result from the version of transformers. Current code is based on version=4.31.0. We will modify the code to support the latest transformers version, will release the code...

Hi, The HH scores should be sequence-independent. In this implementation, we use one sequence in each batch for testing. Will update the implementation for multi sequences shortly, by modifying (https://github.com/FMInference/H2O/blob/main/h2o_hf/utils_real_drop/modify_llama.py#L269)

Hi, Thanks for your question. Did you use Llama-2-7b? The model used in the paper is "huggyllama/llama-7b".

Hi, could you provide the detailed command and tranformers version you used? I didn't reproduce the same issue on my side when using huggyllama/llama-7b.

Hi, I tested the samples from 795 to 800, but didn't encounter the same error. Based on your error information, could you try to specify "pad_token_id=tokenizer.eos_token_id" in the model.generate() function.

Hi, I followed the original HELM for these parameters. Generally, large temperature will bring more diversity and less deterministic.

Thanks for the question. We simplify the helm evaluation code. Please check the summarization benchmarking part "scripts/summarization/eval.sh"

Hi, that's because the model name registered in https://github.com/FMInference/H2O/blob/main/h2o_hf/data/xsum.jsonl is GPT-NeoX-20B, but it won't affect the final results. When extracting the data from HELM for local evaluation, we use the...

Hi, the results in Table 6 are obtained from OPT-30B (As described in 5.3.Q3). And for practical use, you can use the accumulation attention scores obtained from the whole prefilling...