r0
r0
Added hardcoding of `float32` for the activation layer. This will prevent overriding from the following call: ``` from tensorflow.keras import mixed_precision mixed_precision.set_global_policy('mixed_float16') ``` This can help enable mixed precision training...
Audio files list is being created and populated in the main dataset class. This can lead to more memory usage
Currently whenever we do a fresh install of audiotoken, and import it using `import audiotoken`, the models are downloaded then and there. Model should be loaded/downloaded in a lazy fashion...
- Fused Q,K,V projection into one matmul - Fused MHA into one single layer instead of concatenation of 12
Solves #2864 for `target_modules` Enables `ensure_weight_tying` flag in `LoraConfig` for `target_modules`. For LoRA, if any of the tied layers are added to `target_modules` and `ensure_weight_tying == True`, the adapters added...
# What does this PR do? Adds support for `mamba_ssm` and `causal_conv1d` kernels from the kernel-hub in bamba models. Fixes # (issue) https://github.com/huggingface/transformers/issues/41208 ## Before submitting - [ ] This...
### System Info - `transformers` version: 4.55.4 - Platform: Linux-5.14.0-284.73.1.el9_2.x86_64-x86_64-with-glibc2.39 - Python version: 3.12.3 - Huggingface_hub version: 0.36.0 - Safetensors version: 0.5.2 - Accelerate version: 1.12.0 - Accelerate config: not...