tokenizer issues

Use go:embed for pretrained gpt2 tokenizer, add missing json

1

This change adds the missing json file to the repo, as well as embeds both the vocab and merge files and uses the embedded FS to create the pre-trained tokenizer.

adayNU

Bump version?

The old versions have the "file not found" bug when trying to use pretrained tokenizers.

hugbubby

BOS/EOS tokens

1

Hi, First of all, thanks for this great package! I am running inference against a triton server serving transformer models from go, and this library is a tremendous help. One...

trpstra

int64

Convert int to int64 at API to make it easy for API consumers. For example: `tokenizer.Decode(ids []int64)`

sugarme

enhancement

Serialization of the Tokenizer and all the parts (PreTokenizer, Normalizer, ...) using `encoding/gob`. It is now easy to save/load an entire tokenizer. See: https://stackoverflow.com/questions/28020070/golang-serialize-and-deserialize-back

sugarme

enhancement

remove log from init

2

Hi, thanks for this lib! I found that a `log.Print` is used at init: https://github.com/sugarme/tokenizer/blob/master/init.go#L21 which I can't avoid. My application is using stdout & stderr to communicate. Do you...

jackielii