stablediffusion
stablediffusion copied to clipboard
Tokenizer in OpenCLIP seems to be appending 0s rather than eot tokens to the specified length
Previously, stablediffusion uses CLIP's tokenizer, that will appending eot tokens until the specified length (77). It seems that the newer one (at least the one in txt2img.py) uses SimpleTokenizer and will appending 0 until the specified length: https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/tokenizer.py#L183
Not sure what's the implication to the training process would be. I also checked the vocab, 0 does mean ! rather than any special tokens such as <start_of_text> or <end_of_text>.