SharpToken icon indicating copy to clipboard operation
SharpToken copied to clipboard

SharpToken is a C# library for tokenizing natural language text. It's based on the tiktoken Python library and designed to be fast and accurate.

Results 9 SharpToken issues
Sort by recently updated
recently updated
newest added

Cohere has two tokenizers: [chat](https://huggingface.co/Cohere/Command-nightly) [embeddings](https://huggingface.co/Cohere/multilingual-22-12) It would be great to add them to this project.

Can we use SharpToken for Anthropic? I could not find if claude is using "cl100k_base" or other encoding

Hi, Any chance on update for Gbt4o? Thanks!

Notes: - I added in the new test plans for o200k based on output from tiktoken - I added a SetupO200k() in CompareBenchmark (and renamed Setup to SetupCL200k), but it's...

It would be great to add gpt-4o o200k_base encoding support

Hi I am using your library in MAUI android. For some reason method `SharpToken.GptEncoding.GetEncoding(o200k_base) ` first time takes minutes to respond What can be issue?

add support for utf8 input and output Proposed API is ```cs public List EncodeFromUtf8(ReadOnlySpan lineToEncode, ISet allowedSpecial = null, ISet disallowedSpecial = null); public byte[] DecodeToUtf8(IEnumerable inputTokensToDecode); ```

Is there a way to get the vocabulary size of an initialized tokenizer from `var encoding = GptEncoding.GetEncoding(encodingName);`?

Hello, are you planning to update list of supported models? https://github.com/dmitry-brazhenko/SharpToken/blob/5e377adb6bc1113ffd3c6184f607c762bc58fc12/SharpToken/Lib/Model.cs#L64 I'm looking for model `4.1` which is already on `tiktoken`: https://github.com/openai/tiktoken/blob/3591ff175d6a80efbe4fcc7f0e219ddd4b8c52f1/tiktoken/model.py#L13 :)