example/speculative: drafting fails completely when params.sparams.temp is set to 0

Open mscheong01 opened this issue 1 year ago • 2 comments

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

In the current speculative.cpp implementation, params.sparams.temp is forced to -1.0f However, if I change this value to 0:

draft sampling seems to fail completely: (speculative.log)

Is this intended behavior? I'm working on #5625 which removes the temperature limit so I'd like to get this fixed

Feb 22 '24 05:02 mscheong01

I guess this is because when temperature is 0, the sampling logic does not output probabilites for each tokens? If so, this seems like a viable solution.

Feb 22 '24 05:02 mscheong01

If so, this seems like a viable solution.

Yes, this should work

Feb 22 '24 09:02 ggerganov