What does this PR do?

Qwen2 and Qwen2-MoE model is forced to add bias to the query, key and value linear projections. However, following the trend with other recent models (e.g. llama), I refactored these attention_qkv_bias to be optional so that we can configure it in config file.

Fixes #32892

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you read the contributor guideline, Pull Request section?
[ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
[x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[ ] Did you write any new necessary tests?

Who can review?

@ArthurZucker @stevhliu

Aug 20 '24 06:08 wavy-jung

RuntimeError: Failed to import transformers.models.audio_spectrogram_transformer.feature_extraction_audio_spectrogram_transformer because of the following error (look up to see its traceback):

I have no idea why the check_repo test failed with the error messages above. Can you please give me some guides? to: @ArthurZucker @stevhliu

Aug 20 '24 06:08 wavy-jung

cc @ArthurZucker

Aug 20 '24 09:08 amyeroberts

@ArthurZucker Thank you for your response! What I intended is to allow the option to be set/unset in qwen2 as well. In fact, the LLaMA model has never used a bias in linear projection across all its series, and in the official repository, they are all set to bias=False. However, in the Hugging Face implementation, this can be set/unset through the config. Therefore, I want to provide the same functionality here as well.

Aug 21 '24 04:08 wavy-jung

@ArthurZucker Hey, do you have any further comments for this?

Aug 23 '24 06:08 wavy-jung

Hey! The reason we added that param for llama is for the release of #26302: InternLM was officially using this flag, meaning there is a model related to this change! Unless a model is publish we usually avoid just enabling something that does not have a model associated! 🤗

Aug 27 '24 12:08 ArthurZucker

Make qwen2 `attention_qkv_bias` optional

What does this PR do?

Before submitting

Who can review?