Using EAGLE will slow down inference
Thank you very much for your work on EAGLE; it has been extremely helpful to me.
I have a question: why does downloading yuhuili/EAGLE-Vicuna-7B-v1.3 from Hugging Face and using it directly to accelerate lmsys/vicuna-7b-v1.3 result in a negative effect? However, using my own trained EAGLE head produces a speedup effect. Could you please tell me where I went wrong?
Below is a screenshot of my operation.
I would greatly appreciate any assistance you can provide in resolving this issue. Thank you very much.
Maybe try temperature=0.
Maybe try temperature=0.
Thank you very much for your valuable advice. However, I obtained the same result regardless of the temperature.
@zkqq The correct drafts will be displayed in yellow. I noticed that there are almost no yellow words in your image. You may not have correctly matched the draft model with the base model, or you did not set the --model-type parameter. Its default value is llama-2-chat, and it must be changed to vicuna.
yuhuili/EAGLE-Vicuna-7B-v1.3
Thank you very much for your reply. You are correct; the issue likely stems from the mismatch between the EAGLE head and the origin model. However, I believe I have configured all necessary parameters, including the model type.
I trained an EAGLE head, ran webui.py and the evaluation, and observed a good acceleration effect. However, when switching back to the EAGLE head from yuhuili/EAGLE-Vicuna-7B-v1.3, there are negative impacts. Both config.json are identical, with the only difference being the pytorch_model.bin file.
No issues were encountered when using yuhuili/EAGLE-Vicuna-7B-v1.3, but there are issues with the weights you trained yourself?
yuhuili/EAGLE-Vicuna-7B-v1.3,
On the contrary, there is no issue with utilizing the model weights that I have trained personally. However, employing the yuhuili/EAGLE-Vicuna-7B-v1.3 weights may result in adverse effects.
The possible reason is that the template or weights of your base model are different from those used when we trained the draft model.