Why not use updated parent beam's embedding as next input when inferring?

Open nkqiaolin opened this issue 8 years ago • 1 comments

Nice implementation first!

when reading your code ConvDecoderFairseqBS, in each step(), beam_search will select new top K beams, but the inputs to next step is just replace this time step's padded zero with this top K words. The decode sequence from 0 to time-1 is kept as before. I'm confused here, since the new top K words may come from different beams with previous steps'. Is this by design or just a mistake plz? :)

Specific code is here:

cur_inputs = inputs[:,0:time+1,:] 
zeros_padding = inputs[:,time+2:,:] 
cur_inputs_pos = self.add_position_embedding(cur_inputs, time)

enc_output, beam_state = state 
logits = self.infer_conv_block(enc_output, cur_inputs_pos)
bs_output, beam_state = beam_search.beam_search_step(.....

finished, next_inputs = self.next_inputs(sample_ids=bs_output.predicted_ids)
next_inputs = tf.reshape(next_inputs, [self.config.beam_width, 1, inputs.get_shape().as_list()[-1]])
next_inputs = tf.concat([cur_inputs, next_inputs], axis=1) ## *** why not update cur_inputs to the new beams? **
next_inputs = tf.concat([next_inputs, zeros_padding], axis=1)

Oct 24 '17 11:10 nkqiaolin

I checked the fairseq-py, which will re-order the input for every beam search step:

def reorder_buffer(self, new_order):
    if self.input_buffer is not None:
        self.input_buffer = self.input_buffer.index_select(0, new_order)

Oct 24 '17 12:10 nkqiaolin