machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

xlm-Roberta tokenizer

Open ilaykid opened this issue 3 years ago • 1 comments

hey, can you add support to xlm-Roberta tokenizer? it's a very useful tokenizer that could be very helpful. thank you!

ilaykid avatar Nov 30 '22 14:11 ilaykid

Would like to second this - this would be a useful tokenizer to have as it is used by Donut (another nice-to-have) in the huggingface transformers library.

Adding this tokenizer should be easier now that SentencePiece is implemented in the Microsoft.ML.Tokenizers library.

tk4218 avatar Feb 05 '25 23:02 tk4218