ramos
ramos
Hi! Do you have any plan to support conformer which is common in asr?(https://arxiv.org/abs/2005.08100).
## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...
### Proposal The load_checkpoint method in "colossalai.utils.checkpoint" cannot scatter large model state dict parameters since the gather operations will first fill the gpu memory and cause OOM problems. The proposal...
FlashAttention can largely avoid memory usage and speeds up attention even in the process of inference. Any plan to support this implementation: https://github.com/facebookresearch/xformers/tree/main/xformers/csrc/attention
Hello, I am writing this topic to ask if it will be useful in speech recognition tasks. I am going to test your ACE loss on my acoustic model. Hope...
As the Whisper model is encoder-decoder structure, do you have any example of using inferflow?
Consider integrating selective activation checkpointing, as featured in PyTorch's blog ["Maximizing Training Throughput"](https://pytorch.org/blog/maximizing-training/), into LitGPT. Adding a selective_activation_checkpointing kwarg would enable users to leverage this strategy alongside FSDP, facilitating training...