LLamaSharp Wrong result when change to other model.

It works well when I use LLama2-7b-Chat, but when I changed the model to a new version mixtral-8x7b-v0.1Q2_K, when I ask the same question it seems that the robot gave a wrong answer, and it even changed my original question.

Should I change some options or parameters some where when I change to another model? Anyone can help me? thanks. wrong correct

Feb 02 '24 07:02 icemaple1251

Q2 is a pretty small quantisation, have you tested your Q2 model in llama.cpp directly to check this isn't just a bad response caused by the quantisation?

Feb 02 '24 14:02 martindevans

I have not tested your Q2 model in llama.cpp directly. But I do have try other models like "mixtral-8x7b-v0.1.Q8_0.gguf" I still get wo wrong answer, some answers may be repeated for several times. If some models are special for chat but others are not?

Feb 06 '24 08:02 icemaple1251

The mixtral model you mentioned is Q8, which is much more forgiving than Q2. The smaller than number the more the model has been compressed, and the more likely it is to give bad answers.

Feb 06 '24 16:02 martindevans