build(deps): update transformers[sentencepiece] requirement from ~=4.35.2 to ~=4.37.0
Updates the requirements on transformers[sentencepiece] to permit the latest version.
Release notes
Sourced from transformers[sentencepiece]'s releases.
v4.37 Qwen2, Phi-2, SigLIP, ViP-LLaVA, Fast2SpeechConformer, 4-bit serialization, Whisper longform generation
Model releases
Qwen2
Qwen2 is the new model series of large language models from the Qwen team. Previously, the Qwen series was released, including Qwen-72B, Qwen-1.8B, Qwen-VL, Qwen-Audio, etc.
Qwen2 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes.
- Add qwen2 by
@JustinLin610in #28436Phi-2
Phi-2 is a transformer language model trained by Microsoft with exceptionally strong performance for its small size of 2.7 billion parameters. It was previously available as a custom code model, but has now been fully integrated into transformers.
- [Phi2] Add support for phi2 models by
@susnatoin #28211- [Phi] Extend implementation to use GQA/MQA. by
@gugarosain #28163- update docs to add the
phi-2example by@susnatoin #28392- Fixes default value of
softmax_scaleinPhiFlashAttention2. by@gugarosain #28537SigLIP
The SigLIP model was proposed in Sigmoid Loss for Language Image Pre-Training by Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer. SigLIP proposes to replace the loss function used in CLIP by a simple pairwise sigmoid loss. This results in better performance in terms of zero-shot classification accuracy on ImageNet.
- Add SigLIP by
@NielsRoggein #26522- [SigLIP] Don't pad by default by
@NielsRoggein #28578ViP-LLaVA
The VipLlava model was proposed in Making Large Multimodal Models Understand Arbitrary Visual Prompts by Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee.
VipLlava enhances the training protocol of Llava by marking images and interact with the model using natural cues like a “red bounding box” or “pointed arrow” during training.
- Adds VIP-llava to transformers by
@younesbelkadain #27932- Fix Vip-llava docs by
@younesbelkadain #28085FastSpeech2Conformer
The FastSpeech2Conformer model was proposed with the paper Recent Developments On Espnet Toolkit Boosted By Conformer by Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, and Yuekai Zhang.
FastSpeech 2 is a non-autoregressive model for text-to-speech (TTS) synthesis, which develops upon FastSpeech, showing improvements in training speed, inference speed and voice quality. It consists of a variance adapter; duration, energy and pitch predictor and waveform and mel-spectrogram decoder.
- Add FastSpeech2Conformer by
@connor-hendersonin #23439Wav2Vec2-BERT
The Wav2Vec2-BERT model was proposed in Seamless: Multilingual Expressive and Streaming Speech Translation by the Seamless Communication team from Meta AI.
This model was pre-trained on 4.5M hours of unlabeled audio data covering more than 143 languages. It requires finetuning to be used for downstream tasks such as Automatic Speech Recognition (ASR), or Audio Classification.
... (truncated)
Commits
8e3e145[GPTNeoX] Fix BC issue with 4.36 (#28602)344943bFix_speculative_samplingimplementation (#28508)5fc3e60[SigLIP] Don't pad by default (#28578)5ee9fcbFix wrong xpu device in DistributedType.MULTI_XPU mode (#28386)e156abd[Whisper] Finalize batched SOTA long-form generation (#27658)a485e46Add w2v2bert to pipeline (#28585)d381d85Release: v4.37.0db9a7e9Don't saveprocessor_config.jsonif a processor has no extra attribute (#2...772307bMaking CTC training example more general (#28582)186aa6b[Whisper] Fix audio classification with weighted layer sum (#28563)- Additional commits viewable in compare view
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
-
@dependabot rebasewill rebase this PR -
@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it -
@dependabot mergewill merge this PR after your CI passes on it -
@dependabot squash and mergewill squash and merge this PR after your CI passes on it -
@dependabot cancel mergewill cancel a previously requested merge and block automerging -
@dependabot reopenwill reopen this PR if it is closed -
@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually -
@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency -
@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) -
@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) -
@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)