how to support more CLM models?

Open fedshyvana opened this issue 2 years ago • 1 comments

Hi, I am trying to use prefix tuning with biogpt: https://github.com/huggingface/transformers/blob/v4.27.1/src/transformers/models/biogpt/modeling_biogpt.py#L440

Following the prefix tuning + CLM example, I am able to train without issue, but once the model is trained. Model.generate() returns the following error:

File ~/anaconda3/envs/vl/lib/python3.8/site-packages/transformers/models/biogpt/modeling_biogpt.py:482, in BioGptModel._prepare_decoder_attention_mask(self, attention_mask, input_shape, inputs_embeds, past_key_values_length) 476 if attention_mask is not None: 477 # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] 478 expanded_attn_mask = _expand_mask(attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]).to( 479 inputs_embeds.device 480 ) 481 combined_attention_mask = ( --> 482 expanded_attn_mask if combined_attention_mask is None else expanded_attn_mask + combined_attention_mask 483 ) 485 return combined_attention_mask

RuntimeError: The size of tensor a (12) must match the size of tensor b (37) at non-singleton dimension 3

I am using 25 virtual tokens so it appears this is where the mismatch between 12 vs. 37 happens. Any pointers on how I may be able to add support for this model in terms of .generate()? Thanks!

Mar 20 '23 13:03 fedshyvana

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Apr 19 '23 15:04 github-actions[bot]