example/speculative: drafting fails completely when params.sparams.temp is set to 0
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
In the current speculative.cpp implementation, params.sparams.temp is forced to -1.0f
However, if I change this value to 0:
draft sampling seems to fail completely:
(speculative.log)
Is this intended behavior? I'm working on #5625 which removes the temperature limit so I'd like to get this fixed
I guess this is because when temperature is 0, the sampling logic does not output probabilites for each tokens?
If so, this seems like a viable solution.
If so, this seems like a viable solution.
Yes, this should work