alibi icon indicating copy to clipboard operation
alibi copied to clipboard

AnchorText - extension for other language models.

Open RobertSamoilescu opened this issue 4 years ago • 0 comments

AnchorText offers support for three masked language model: DistilbertBaseUncased, BertBaseUncased, RobertaBase. All previously enumerate classes inherit the LanguageModel class and overwrite two methods. For example, DistilbertBaseUncased:

class DistilbertBaseUncased(LanguageModel):
    SUBWORD_PREFIX = '##'

    def __init__(self, preloading: bool = True):
        """
        Initialize DistilbertBaseUncased.

        Parameters
        ----------
        preloading
            See `LanguageModel` constructor.
        """
        super().__init__("distilbert-base-uncased", preloading)

    @property
    def mask(self) -> str:
        return self.tokenizer.mask_token

    def is_subword_prefix(self, token: str) -> bool:
        return token.startswith(DistilbertBaseUncased.SUBWORD_PREFIX)

Other language models can be included in a similar fashion. How should we manage this extension? Should we write a tutorial for wrapping any transformer with LanguageModel? Or can we do something more out-of-the-box?

RobertSamoilescu avatar Jul 05 '21 12:07 RobertSamoilescu