Viacheslav Seledkin
Viacheslav Seledkin
Adds support of UTF-8, must be compatible with models created from one-byte encoded inputs (but this is not verified), not compatible with old models created from utf-8 because they have...
Currently, text is generated from latent point by sampling from distributions produced by generator over vocabulary of tokens. But since z is multivariate gaussian we can also sample from it...
This adds UTF-8 support for multibyte encoded corpora
It probably must start from 0? one key and value which is at 0 is always missed when using iterator correction: iterator interface is a bit confusing, right pattern of...