implicit_chain_of_thought icon indicating copy to clipboard operation
implicit_chain_of_thought copied to clipboard

why is MIXTURE_SIZE set to 1

Open Mlydi opened this issue 1 year ago • 1 comments

Hello,

Thanks for your nice repo. I noticed that MIXTURE_SIZE is set to 1 in your provided example command.

self.mixture_components = nn.Embedding(config.mixture_size, hidden_size)

I feel curious why mixture_size is not the size of vocabs?

Mlydi avatar Apr 09 '24 11:04 Mlydi

Sorry I just noticed this issue... It's because the mixture approach is only used on GSM8K but not on multiplication. Multiplication's CoT is deterministic given the input by design, so it's not necessary to use the mixture approach (setting mixture size to 1 is the way we disable the mixture approach).

da03 avatar Jul 10 '24 07:07 da03