Christian Wallenwein
Results
2
comments of
Christian Wallenwein
This feature actually exists. Don't know for how long though. Just click on the length information in the bottom right corner 
The tokens from index 3 to 258 are not ASCII characters but tokens used for Byte-Fallback. There are 140k+ unicode characters but the vocab size of Llama is just 32k....