implicit_chain_of_thought
implicit_chain_of_thought copied to clipboard
why is MIXTURE_SIZE set to 1
Hello,
Thanks for your nice repo. I noticed that MIXTURE_SIZE is set to 1 in your provided example command.
self.mixture_components = nn.Embedding(config.mixture_size, hidden_size)
I feel curious why mixture_size is not the size of vocabs?
Sorry I just noticed this issue... It's because the mixture approach is only used on GSM8K but not on multiplication. Multiplication's CoT is deterministic given the input by design, so it's not necessary to use the mixture approach (setting mixture size to 1 is the way we disable the mixture approach).