tokenizers
tokenizers copied to clipboard
NormalizedString.clear() broken?
Hello. I think there are some problems with NormalizedString (tokenizers 0.15.2).
In the following example, append() works as expected.
from tokenizers import NormalizedString
s = NormalizedString("Hi.") # NormalizedString(original="Hi.", normalized="Hi.")
s.append("Hello.") # NormalizedString(original="Hi.", normalized="Hi. Hello.")
After using clear(), append() no longer modifies the normalized attribute.
from tokenizers import NormalizedString
s = NormalizedString("Hi.") # NormalizedString(original="Hi.", normalized="Hi.")
s.clear() # NormalizedString(original="Hi.", normalized="")
s.append("Hello.") # NormalizedString(original="Hi.", normalized="")
This is also a problem with prepend.
Indeed, would you like to have a go at it and open a PR ? 🤗
Has there been any update about this? I just encountered this as well :)
Update: This issue was fixed in #1717