conv_seq2seq
conv_seq2seq copied to clipboard
Why not use updated parent beam's embedding as next input when inferring?
Nice implementation first!
when reading your code ConvDecoderFairseqBS, in each step(), beam_search will select new top K beams, but the inputs to next step is just replace this time step's padded zero with this top K words. The decode sequence from 0 to time-1 is kept as before. I'm confused here, since the new top K words may come from different beams with previous steps'. Is this by design or just a mistake plz? :)
Specific code is here:
cur_inputs = inputs[:,0:time+1,:]
zeros_padding = inputs[:,time+2:,:]
cur_inputs_pos = self.add_position_embedding(cur_inputs, time)
enc_output, beam_state = state
logits = self.infer_conv_block(enc_output, cur_inputs_pos)
bs_output, beam_state = beam_search.beam_search_step(.....
finished, next_inputs = self.next_inputs(sample_ids=bs_output.predicted_ids)
next_inputs = tf.reshape(next_inputs, [self.config.beam_width, 1, inputs.get_shape().as_list()[-1]])
next_inputs = tf.concat([cur_inputs, next_inputs], axis=1) ## *** why not update cur_inputs to the new beams? **
next_inputs = tf.concat([next_inputs, zeros_padding], axis=1)
I checked the fairseq-py, which will re-order the input for every beam search step:
def reorder_buffer(self, new_order):
if self.input_buffer is not None:
self.input_buffer = self.input_buffer.index_select(0, new_order)