how to support more CLM models?
Hi, I am trying to use prefix tuning with biogpt: https://github.com/huggingface/transformers/blob/v4.27.1/src/transformers/models/biogpt/modeling_biogpt.py#L440
Following the prefix tuning + CLM example, I am able to train without issue, but once the model is trained. Model.generate() returns the following error:
File ~/anaconda3/envs/vl/lib/python3.8/site-packages/transformers/models/biogpt/modeling_biogpt.py:482, in BioGptModel._prepare_decoder_attention_mask(self, attention_mask, input_shape, inputs_embeds, past_key_values_length) 476 if attention_mask is not None: 477 # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] 478 expanded_attn_mask = _expand_mask(attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]).to( 479 inputs_embeds.device 480 ) 481 combined_attention_mask = ( --> 482 expanded_attn_mask if combined_attention_mask is None else expanded_attn_mask + combined_attention_mask 483 ) 485 return combined_attention_mask
RuntimeError: The size of tensor a (12) must match the size of tensor b (37) at non-singleton dimension 3
I am using 25 virtual tokens so it appears this is where the mismatch between 12 vs. 37 happens. Any pointers on how I may be able to add support for this model in terms of .generate()? Thanks!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.