Haris Jabbar
Haris Jabbar
Hi. I am trying to convert corpora from HF to their IPA form with the following snippet. But I am getting really slow speeds.. only a couple of examples per...
Hi! Thank you for great repo and the models. I want to pretrain the model with a new [tokenizer](https://arxiv.org/abs/2307.07262), but since 16 A100 GPUs are hard to get by, I...
@karpathy Thanks for the great lecture and implementation! As always, it was a pleasure. I have tried to implement LlamaTokenizer (without using sentencepiece backend) staying as close to minbpe implementation...
If I am not mistaken, the sum of PRS_signature values should be equal to Nmorph. However during data exploration I found quite a few entries where these values don't match....