Srinivasan Nandakumar

Results 3 comments of Srinivasan Nandakumar

Thanks. More info on this from my other run: same script runs perfectly fine on a RTX 4090 where I set bf16 training to be true. So my guess is...

(An update here for more info) I tried another model (Qwen 1.5B) with fp 16 training and it works fine. Problem is specific to tiny llama i think.

Hi, I figured a work around. The underlying search code increases the beam width by a factor of 2. So when initializing the LLM , set max_logprobs to twice the...