Colm Evans

Results 2 issues of Colm Evans

As I understand the current MoeLayer, a gate calculates the weight to be applied to the output of each expert, the top k are selected and run on the data,...

Added a basic example (examples/pufferl_lstm_wrapper.py) showing the use of LSTMWrapper with the Default model from models.