MixtralKit
MixtralKit copied to clipboard
why the performance for the GSM8K and MATH lower than the original mixtral blog?
I noticed that the performance for the math reasoning is lower than the official blog. Is it due to the zero-shot setting compared the official 5-shot setting?