Srinivasan Nandakumar
Srinivasan Nandakumar
Thanks. More info on this from my other run: same script runs perfectly fine on a RTX 4090 where I set bf16 training to be true. So my guess is...
(An update here for more info) I tried another model (Qwen 1.5B) with fp 16 training and it works fine. Problem is specific to tiny llama i think.
Hi, I figured a work around. The underlying search code increases the beam width by a factor of 2. So when initializing the LLM , set max_logprobs to twice the...