fsdp_qlora icon indicating copy to clipboard operation
fsdp_qlora copied to clipboard

nan when the input length is large

Open bilalghanem opened this issue 1 year ago • 6 comments

Hi

Thanks for your efforts folks! While I was testing the code on my own dataset, I found that when the length of the input is large (~4000), the loss becomes Nan from the first step: Epoch 0, Loss nan, LR 1.00e-05: 12%|█████

For the same dataset, when I truncate my input to something shorter, I start to see the loss. What is the problem?

bilalghanem avatar Apr 06 '24 00:04 bilalghanem

I think there is an issue in the code if I am not mistaken. The padding should be on the left side: [:, -args["context_length"]:] in collate_fn function.

I after I did this, the loss started to appear. Could you please confirm?

bilalghanem avatar Apr 06 '24 01:04 bilalghanem

I think there is an issue in the code if I am not mistaken. The padding should be on the left side: [:, -args["context_length"]:] in collate_fn function.

I after I did this, the loss started to appear. Could you please confirm?

wouldn't doing this truncate it from the left side?

Xynonners avatar Apr 06 '24 10:04 Xynonners

I think there is an issue in the code if I am not mistaken. The padding should be on the left side: [:, -args["context_length"]:] in collate_fn function. I after I did this, the loss started to appear. Could you please confirm?

wouldn't doing this truncate it from the left side?

sorry, i didn't get you. You mean my update will not truncate it from the left?

bilalghanem avatar Apr 06 '24 11:04 bilalghanem

I think there is an issue in the code if I am not mistaken. The padding should be on the left side: [:, -args["context_length"]:] in collate_fn function. I after I did this, the loss started to appear. Could you please confirm?

wouldn't doing this truncate it from the left side?

sorry, i didn't get you. You mean my update will not truncate it from the left?

I mean, if you have a tensor like [1,2,3,4], doing so would truncate it from the left side to make [2,3,4]. This is equivalent to having a string such as ABCD, it will be truncated to BCD iiuc.

Xynonners avatar Apr 07 '24 01:04 Xynonners

[:, -args["context_length"]:]

not sure but I don't think so. This will truncate the second dim (sequence length) only, to have a specific length.

bilalghanem avatar Apr 08 '24 16:04 bilalghanem

This has been a persistent issue for me while trying to fine-tune a Llama model on analyst reports using bnb_dora. The above suggestion regarding changing the padding has not helped. I have tried reducing the --context_length arg to as low as 256 and input length of my training data as low as 1024 tokens, but still see "Loss nan". Truncating the input length any further is pointless, as only a very small number of reports are that short.

If anyone has found a workaround for this I would greatly appreciate the knowledge.

mrgohlke avatar Nov 06 '24 19:11 mrgohlke