You need 36GB to run fp16 version models and 24GB is only enough for 4bits model
You need 36GB to run fp16 version models and 24GB is only enough for 4bits model
Traceback (most recent call last):
File "/home/spider/slj/project/wangjingli/CogVLM/basic_demo/cli_demo_hf.py", line 79, in
outputs = model.generate(**inputs, **gen_kwargs)
File "/home/spider/anaconda3/envs/video/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/spider/anaconda3/envs/video/lib/python3.10/site-packages/transformers/generation/utils.py", line 1764, in generate
return self.sample(
File "/home/spider/anaconda3/envs/video/lib/python3.10/site-packages/transformers/generation/utils.py", line 2897, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0
when i run "python cli_demo_hf.py --quant 4 --fp16" i got this wrong.
why is this happened?