lit icon indicating copy to clipboard operation
lit copied to clipboard

The reason for using "[MASK]" to replace a masked word in the LIME method

Open wmmxk opened this issue 5 years ago • 1 comments

In the implementation of LIME class, the mask_string is [MASK]. This is also a TODO note, #TODO(lit-dev): make configurable in UI. So it seems a user can use other token to replace a masked token. What is the implication of using a different token to replaced masked words?

wmmxk avatar Sep 23 '20 13:09 wmmxk

Hi! This is an open research question. LIME would go with the [UNK] token, but since many are using BERT-like models, which have seen the [MASK] token for an undefined input a lot more often (and almost no [UNK] tokens since it uses a WordPiece vocab), it seems to be a better choice. We would like to make this configurable because there is no single right option that works across all models.

bastings avatar Sep 23 '20 14:09 bastings