Colm Evans
Results
2
issues of
Colm Evans
As I understand the current MoeLayer, a gate calculates the weight to be applied to the output of each expert, the top k are selected and run on the data,...
Added a basic example (examples/pufferl_lstm_wrapper.py) showing the use of LSTMWrapper with the Default model from models.