ramos issues

Results 7 issues of


                                            ramos

need support of conformer

Hi! Do you have any plan to support conformer which is common in asr?(https://arxiv.org/abs/2005.08100).

support shardinit option to avoid OPT OOM initializing problem

## 📌 Checklist before creating the PR - [x] I have created an issue for this PR for traceability - [x] The title follows the standard format: `[doc/gemini/tensor/...]: A concise...

Run Build and Test

[PROPOSAL]: A significant improvement in user-friendliness

### Proposal The load_checkpoint method in "colossalai.utils.checkpoint" cannot scatter large model state dict parameters since the gather operations will first fill the gpu memory and cause OOM problems. The proposal...

enhancement

Request to support FlashAttention in cuda attention.cc

FlashAttention can largely avoid memory usage and speeds up attention even in the process of inference. Any plan to support this implementation: https://github.com/facebookresearch/xformers/tree/main/xformers/csrc/attention

enhancement

Would it be helpful in Speech Recognition especially in acoustic model to replace CTC with ACE?

Hello, I am writing this topic to ask if it will be useful in speech recognition tasks. I am going to test your ACE loss on my acoustic model. Hope...

Support of Whisper

As the Whisper model is encoder-decoder structure, do you have any example of using inferflow?

combine FSDP with selective activation checkpointing

Consider integrating selective activation checkpointing, as featured in PyTorch's blog ["Maximizing Training Throughput"](https://pytorch.org/blog/maximizing-training/), into LitGPT. Adding a selective_activation_checkpointing kwarg would enable users to leverage this strategy alongside FSDP, facilitating training...