Tokenizing a file with a lot of nested open arrays takes almost a minute

Open kimadeline opened this issue 1 year ago • 0 comments

When tokenizing the files in the https://github.com/Kotlin/kotlinx.serialization repo, the cl100k_base tokenizer struggled on the following files:

While the rest of the files usually took less than a millisecond.

Oct 18 '24 11:10 kimadeline