Sinusoidal positional encodings overwritten by post_init()
I am trying to use DistillBert with sinusoidal positional encodings. At https://github.com/huggingface/transformers/blob/e68ff304194b4fba17cd13dc32fc3d3d61e9c0a7/src/transformers/models/distilbert/modeling_distilbert.py#L713 aren't sinusoidal positional encodings overwritten by init_weights() at line 632?
Hi @hovnatan, thanks for raising this!
Indeed, this doesn't seem desirable at all. My suspicion is that this hasn't been caught as:
- Most people are loading in pretrained weights, in which case they won't be randomly initialized
- The default in the config is
sinusoidal_pos_embds=False, so the weights should be randomly initialized
As a sense check, I looked at the weights for the embeddings under different settings, and can confirm that if sinusoidal_pos_embds=True is only correct if we disable the post_init step
In fact, searching through the library, it looks like this might be an issue with a few other models. Specifically:
- Flaubert
- Informer
- XLM
@hovnatan Would you like to open a PR to fix this?
cc @ArthurZucker @younesbelkada as it effects language models.
- Flaubert
- Informer
- XLM
would still benefit from this!
Should I try fixing Flaubert, ... ?
Fixed in Flaubert, Informer, XLM. The fixes are somewhat different for them, since the code structure is different.