lit
lit copied to clipboard
The reason for using "[MASK]" to replace a masked word in the LIME method
In the implementation of LIME class, the mask_string is [MASK]. This is also a TODO note, #TODO(lit-dev): make configurable in UI. So it seems a user can use other token to replace a masked token. What is the implication of using a different token to replaced masked words?
Hi! This is an open research question. LIME would go with the [UNK] token, but since many are using BERT-like models, which have seen the [MASK] token for an undefined input a lot more often (and almost no [UNK] tokens since it uses a WordPiece vocab), it seems to be a better choice. We would like to make this configurable because there is no single right option that works across all models.