gemma_pytorch After deplyed google/gemma-7b-it, there always is error response.

After deplyed google/gemma-7b-it, there always is error response when sending any message.

Response: Of course! Here are some creative ideas for a 10-year-old's birthday party:

Feb 26 '24 06:02 ydh10002023

Thanks! Could you share some more details? What is the error response you are receiving?

Feb 26 '24 16:02 michaelmoynihan

docker run -t --rm --gpus all -v "F:\gemma_pytorch-main\7b\gemma-7b-it.ckpt":/tmp/ckpt 51cd9699e157dfd46257dfc19263593015ffcb8d0f0a0c5a14e11adc89daacda python scripts/run.py --device=cuda --ckpt=/tmp/ckpt --variant=7b --output_len=10 --prompt="Introduce your model version and description information"

Traceback (most recent call last): File "/workspace/gemma/scripts/run.py", line 79, in main(args) File "/workspace/gemma/scripts/run.py", line 53, in main result = model.generate(args.prompt, device) File "/workspace/gemma/gemma/model.py", line 518, in generate next_token_ids = self( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/workspace/gemma/gemma/model.py", line 445, in forward next_tokens = self.sampler( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/workspace/gemma/gemma/model.py", line 78, in forward next_token_ids = torch.multinomial(probs, RuntimeError: probability tensor contains either inf, nan or element < 0

Feb 27 '24 09:02 freefer

I got the same error when trying to run 7b-it. My GPU only has 12gbs of ram so I assumed it just ran out and went back to playing with the 2b-it model.

Feb 28 '24 14:02 Ittiz

@freefer You can try replacing model_config.dtype = "float32" if args.device == "cpu" else "float16" in run.py with model_config.dtype = "float32".

Feb 29 '24 08:02 ghost

Hi @SedrickWang , I've tried the solution, but it doesn't seem to work. Post #10 also mentioned a 'RuntimeError: probability tensor contains either inf, nan, or an element < 0' error. Are these two issues the same?

Mar 10 '24 09:03 ShadovvSinger

Hi, @ShadovvSinger . While playing with gemma-2b-it, I encountered the error RuntimeError: probability tensor contains either inf, nan or element < 0. To resolve it, I replaced model_config.dtype = "float32" if args.device == "cpu" else "float16" in run.py with model_config.dtype = "float32". Therefore, I suspect that this error may be due to floating-point precision. You can try using more precise floating-points (I have tried float64 but my GPU memory was insufficient; if you have a more powerful GPU, you could give it a shot).

Furthermore, I encountered the same error while using gemma-7b-it: python scripts/run.py --device=cuda --ckpt=/tmp/ckpt --variant="7b" --output_len=10 --prompt="Hi, gemma. Introduce your model version and description information". However, when a shorter prompt is used, the error disappears, for example: python scripts/run.py --device=cuda --ckpt=/tmp/ckpt --variant="7b" --output_len=10 --prompt="Hi, gemma.".

Mar 11 '24 07:03 ghost

Try loading the model with dtype=torch.bfloat16 instead of float16.

Jul 05 '24 21:07 keltin13

did the solution work?

Jul 15 '24 13:07 gustheman