nkqiaolin
Results
2
comments of
nkqiaolin
IMO, the original paper said each of three components should all be one scalar, then sum together to fed to sigmod() . But you can also treat the sum of...
I checked the fairseq-py, which will re-order the input for every beam search step: def reorder_buffer(self, new_order): if self.input_buffer is not None: self.input_buffer = self.input_buffer.index_select(0, new_order)