dongluw

Results 2 comments of dongluw

If i understand correctly, the kernel should work if the provided `ids` are output tokens instead of input tokens at step t. However I see it's called with [input tokens...

hey @molbap I saved the `stacked_images` to images patch by patch https://github.com/dongluw/transformers/blob/9ef13ef775fe5a05c634fb2705a500ef59f28763/src/transformers/models/cohere2_vision/image_processing_cohere2_vision_fast.py#L228 this issue only affects generation quality if images are of very high/low aspect ratio full image is ![output_image17](https://github.com/user-attachments/assets/0ecb88e9-4cee-4ea3-b1c5-5142c816739a)