alibi icon indicating copy to clipboard operation
alibi copied to clipboard

AnchorText - long tails

Open RobertSamoilescu opened this issue 4 years ago • 1 comments

Some language models support a limited number of tokens to be processed at once. Thus, the language mode extension of AnchorText splits the text in two text = head + tail, where the head contains at most the maximum number of tokens that the language model can process. The current implementation only perturbs the head and after the head is perturbed, the tail is concatenated as it is.

Is the fact that only the head is perturbed an issue at all? Should we also continue splitting the tail and perturb the rest of the sentence? (Note that a tail can be split several times)

RobertSamoilescu avatar Jul 05 '21 11:07 RobertSamoilescu

Might not be a big issue since most text models to be explained will also be transformer models limited in the same way.

jklaise avatar Jul 08 '21 13:07 jklaise