transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Sinusoidal positional encodings overwritten by post_init()

Open hovnatan opened this issue 1 year ago • 1 comments

I am trying to use DistillBert with sinusoidal positional encodings. At https://github.com/huggingface/transformers/blob/e68ff304194b4fba17cd13dc32fc3d3d61e9c0a7/src/transformers/models/distilbert/modeling_distilbert.py#L713 aren't sinusoidal positional encodings overwritten by init_weights() at line 632?

hovnatan avatar Mar 22 '24 10:03 hovnatan

Hi @hovnatan, thanks for raising this!

Indeed, this doesn't seem desirable at all. My suspicion is that this hasn't been caught as:

  • Most people are loading in pretrained weights, in which case they won't be randomly initialized
  • The default in the config is sinusoidal_pos_embds=False, so the weights should be randomly initialized

As a sense check, I looked at the weights for the embeddings under different settings, and can confirm that if sinusoidal_pos_embds=True is only correct if we disable the post_init step

sinusoidal_embeddings

In fact, searching through the library, it looks like this might be an issue with a few other models. Specifically:

  • Flaubert
  • Informer
  • XLM

@hovnatan Would you like to open a PR to fix this?

cc @ArthurZucker @younesbelkada as it effects language models.

amyeroberts avatar Mar 22 '24 12:03 amyeroberts

  • Flaubert
  • Informer
  • XLM

would still benefit from this!

ArthurZucker avatar Mar 27 '24 05:03 ArthurZucker

Should I try fixing Flaubert, ... ?

hovnatan avatar Mar 27 '24 05:03 hovnatan

Fixed in Flaubert, Informer, XLM. The fixes are somewhat different for them, since the code structure is different.

hovnatan avatar Mar 27 '24 11:03 hovnatan