Robert Burke
Results
32
comments of
Robert Burke
I don't think so ``` >>> from transformers import GPT2Tokenizer >>> tokenizer = GPT2Tokenizer.from_pretrained('gpt2') >>> tokenizer.encode('Ģ') ... [128, 95] ```
I mean, I understand that I can use a separate lookup table of individual byte-values to tokens representing single bytes. But BPE tokenizers generally work by applying pre-tokenization to divide...