Queries Regarding functionality of `extra_mask`
Hi @qkaren
Thanks for open-sourcing the code
I am a little confused regarding why an extra mask is needed as it contains the position ids for the non-stop words in z/ zz while the y_logits vector has been created wrt to x. Why are the two are being combined? Whats the exact use of extra_mask ?
Following the reference code lines
https://github.com/qkaren/COLD_decoding/blob/15476186bb2590375fd5af9e0bbc9db654931d60/cold_decoding.py#L163-L166
https://github.com/qkaren/COLD_decoding/blob/15476186bb2590375fd5af9e0bbc9db654931d60/cold_decoding.py#L241
Thanks!
Thanks for the question. The extra_mask is to implement this description below Eq.(6) in the paper:
In practice, to ease the satisfaction of certain constraints (e.g. n-gram similarity), we expand the candidate set V^k_t
to include constraint tokens.