Gregory (Gabriel) Barello

Results 5 comments of Gregory (Gabriel) Barello

I have also been trying to finetune pix2struct. I find that the losses go to zero very quickly which made me suspect that the attention masks are not being set...

Glad its working for you @arnaudstiegler! I don't have a lot of experience in the guts of the transformers repo (hence my hacky fix inside the forward function :) -...

I would love to be an official contributor, even if its just a one-line code change 😅 I will put together a PR shortly.

Ok so I am working on this PR. It works fine when instantiating a brand new model, but when loading any of the pretrained models the `is_decoder=False` flag is saved...

Ok @younesbelkada I created the PR: https://github.com/huggingface/transformers/pull/23051 Hopefully I have done everything correctly :) If there is a way for me to also fix the pre-trained model configs let me...