smartliuhw

Results 12 comments of smartliuhw

The same problem occurs to me. Do we have to write a script to generate such file or we can download from the web

and one more question. how can i obtain the medusa_choices during inference like the choices in this [file](https://github.com/FasterDecoding/Medusa/blob/main/medusa/model/medusa_choices.py)? is it generated automatically or set manually? looking forward to your relay!...

看了一下进程,游戏好像都是用docker启动的,但是src.server.task_worker看起来没有启动起来,这个命令后面我去看5001端口是没有被占用的,辛苦老哥有时间帮忙看看 @zhc7

@SunMarc Hi Marc, I have tried to install this version of accelerate and pytorch2.6.0 to use trainer on mps device, but got the following error message, could you please help...

@alvarobartt Hi! Thanks for your response! I have tried to set PAD token to the EOS token and ran another task on mistral-7B, but the loss was still abnormal--It started...

Hi @alvarobartt ! I have ran two more experiments on Llama3-8B and Llama2-7B, both works fine in the same code. I think maybe there's some special setting needed for mistral-7B....

Hi @zwhe99 ! I have done some more exps and got these findings: - The ``pad_token_id`` should set to ``eos_token_id`` - Mistral model is quite sensitive to the hy-param ``warmup_steps``,...

> How reproducible is this? I am wondering if it has anything to do with [huggingface/transformers#29285](https://github.com/huggingface/transformers/pull/29285). Maybe try upgrading your transformers version? I tried to run the script again just...

> Fair point. Maybe cc @danielhanchen since he is an expert on Gemma-7B fixes 😂 Hi, I have done some tests and got some findings. It's important for Gemma model...

> @vwxyzjn Thanks for tagging me :) Hi :) > > @smartliuhw Oh yes we noticed the same issue with `packing=True` causing high losses in our blog: https://unsloth.ai/blog/gemma-bugs > >...