EAGLE Using EAGLE will slow down inference

Thank you very much for your work on EAGLE; it has been extremely helpful to me.

I have a question: why does downloading yuhuili/EAGLE-Vicuna-7B-v1.3 from Hugging Face and using it directly to accelerate lmsys/vicuna-7b-v1.3 result in a negative effect? However, using my own trained EAGLE head produces a speedup effect. Could you please tell me where I went wrong?

Below is a screenshot of my operation.

I would greatly appreciate any assistance you can provide in resolving this issue. Thank you very much.

May 20 '24 06:05 zkqq

Maybe try temperature=0.

May 21 '24 12:05 cdliang11

Maybe try temperature=0.

Thank you very much for your valuable advice. However, I obtained the same result regardless of the temperature.

May 21 '24 13:05 zkqq

@zkqq The correct drafts will be displayed in yellow. I noticed that there are almost no yellow words in your image. You may not have correctly matched the draft model with the base model, or you did not set the --model-type parameter. Its default value is llama-2-chat, and it must be changed to vicuna.

May 26 '24 12:05 Liyuhui-12

yuhuili/EAGLE-Vicuna-7B-v1.3

Thank you very much for your reply. You are correct; the issue likely stems from the mismatch between the EAGLE head and the origin model. However, I believe I have configured all necessary parameters, including the model type.

I trained an EAGLE head, ran webui.py and the evaluation, and observed a good acceleration effect. However, when switching back to the EAGLE head from yuhuili/EAGLE-Vicuna-7B-v1.3, there are negative impacts. Both config.json are identical, with the only difference being the pytorch_model.bin file.

May 26 '24 15:05 zkqq

No issues were encountered when using yuhuili/EAGLE-Vicuna-7B-v1.3, but there are issues with the weights you trained yourself?

May 28 '24 13:05 Liyuhui-12

yuhuili/EAGLE-Vicuna-7B-v1.3,

On the contrary, there is no issue with utilizing the model weights that I have trained personally. However, employing the yuhuili/EAGLE-Vicuna-7B-v1.3 weights may result in adverse effects.

May 28 '24 14:05 zkqq

The possible reason is that the template or weights of your base model are different from those used when we trained the draft model.

Jun 28 '24 08:06 Liyuhui-12