Christian Wallenwein

Results 2 comments of Christian Wallenwein

This feature actually exists. Don't know for how long though. Just click on the length information in the bottom right corner ![image](https://user-images.githubusercontent.com/40916592/125681570-f73c6370-0d3b-4241-9f21-b3cdf99c8f74.png)

The tokens from index 3 to 258 are not ASCII characters but tokens used for Byte-Fallback. There are 140k+ unicode characters but the vocab size of Llama is just 32k....