Qwen model issues & embedding and loss has nan
after a loss backward and optimizer step, then forward the embedding layer output hidden states become inf and loss is nan.
+1
请问是sft阶段还是dpo阶段啊,我在作者的框架下,用sft微调chatglm3 loss也会nan
请问是sft阶段还是dpo阶段啊,我在作者的框架下,用sft微调chatglm3 loss也会nan
dpo
Hi, any update on this? Were you able to fix this issue?
I've got this problem too when using the model Qwen2.5-7B.
Python Output:
Computing eval metrics: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:20<00:00, 1.26s/it]
Generating samples...: 0%| | 0/1 [00:00<?, ?it/s]Both max_new_tokens (=2048) and max_length(=512) seem to have been set. max_new_tokens will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Generating samples...: 0%| | 0/1 [00:00<?, ?it/s]
Error executing job with overrides: []
Traceback (most recent call last):
File "~/direct-preference-optimization/train.py", line 114, in main
worker_main(0, 1, config, policy, reference_model)
File "~/direct-preference-optimization/train.py", line 44, in worker_main
trainer.train()
File "~/direct-preference-optimization/trainers.py", line 320, in train
policy_samples, reference_samples = self.get_batch_samples(local_eval_batch)
File "~/direct-preference-optimization/trainers.py", line 188, in get_batch_samples
policy_output = self.policy.generate(
File "~/anaconda3/envs/dpo/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "~/anaconda3/envs/dpo/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate
result = self._sample(
File "~/anaconda3/envs/dpo/lib/python3.10/site-packages/transformers/generation/utils.py", line 3020, in _sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0