Gregory (Gabriel) Barello
Gregory (Gabriel) Barello
I have also been trying to finetune pix2struct. I find that the losses go to zero very quickly which made me suspect that the attention masks are not being set...
Glad its working for you @arnaudstiegler! I don't have a lot of experience in the guts of the transformers repo (hence my hacky fix inside the forward function :) -...
I would love to be an official contributor, even if its just a one-line code change 😅 I will put together a PR shortly.
Ok so I am working on this PR. It works fine when instantiating a brand new model, but when loading any of the pretrained models the `is_decoder=False` flag is saved...
Ok @younesbelkada I created the PR: https://github.com/huggingface/transformers/pull/23051 Hopefully I have done everything correctly :) If there is a way for me to also fix the pre-trained model configs let me...