pointer-generator icon indicating copy to clipboard operation
pointer-generator copied to clipboard

OOV during decoding

Open NeuralPensieve opened this issue 7 years ago • 5 comments

Do you manually set the P_vocab(OOV) = 0 in your code somewhere? I can't seem to find such a thing. In your paper you said:

Note that if w is an out-of-vocabulary (OOV) word, then P_vocab(w) is zero

How would the P_vocab(OOV) be zero? If you don't set it manually to zero, it would not. What if OOV is selected (in the extended vocab) during decoding? Do you replace it during postprocessing?

NeuralPensieve avatar Jun 21 '18 19:06 NeuralPensieve

P_vocab is a distribution over the vocabulary words. So everything outside of the vocab has no mass.

peterjliu avatar Jun 22 '18 18:06 peterjliu

But OOV is added to the vocabulary, right?

NeuralPensieve avatar Jun 22 '18 21:06 NeuralPensieve

The vocab is fixed size throughout. I am looking into this issue in depth. The way OOV is predicted during decoding is really only meaning for during training, where the target sentence guides the prediction. During testing, because the oov words have no vector representation, and don't participate in the attention driven context, the model would have to use other available information. I suspect the model is leveraging the order of the oov words and the context information from their non-oov neighboring words.

bhomass avatar Jun 09 '19 00:06 bhomass

In model.py num 163-164

extra_zeros = tf.zeros((self._hps.batch_size, self._max_art_oovs)) vocab_dists_extended = [tf.concat(axis=1, values=[dist, extra_zeros]) for dist in vocab_dists] # list length max_dec_steps of shape (batch_size, extended_vsize)

It padding the original vocab_dists with zeros tensor which is the P_vocab(OOV)

rookiebird avatar Nov 19 '19 06:11 rookiebird

I find that in the encoding step ,the input of oov is represented as unk_word. According to the code , if an oov is copied by model, that's to say the unk_word embedding contribute a lot to that decoding step? I agree with bhomass that the model is leveraging the context of the oov.

rookiebird avatar Nov 19 '19 06:11 rookiebird