dongluw
dongluw
If i understand correctly, the kernel should work if the provided `ids` are output tokens instead of input tokens at step t. However I see it's called with [input tokens...
hey @molbap I saved the `stacked_images` to images patch by patch https://github.com/dongluw/transformers/blob/9ef13ef775fe5a05c634fb2705a500ef59f28763/src/transformers/models/cohere2_vision/image_processing_cohere2_vision_fast.py#L228 this issue only affects generation quality if images are of very high/low aspect ratio full image is 