Tokenizer
Tokenizer copied to clipboard
Tokenizing a file with a lot of nested open arrays takes almost a minute
When tokenizing the files in the https://github.com/Kotlin/kotlinx.serialization repo, the cl100k_base tokenizer struggled on the following files:
- n_structure_open_array_object.json took 53.7s to tokenize
- n_structure_100000_opening_arrays.json took 6.9s to tokenize
While the rest of the files usually took less than a millisecond.