Songlin Yang issues

Results 36 issues of


                                            Songlin Yang

[feature request] support log-bmm to context-free grammars

I found log-bmm very useful for linear-chain CRF to save memory and speed up, while in context-free grammars, A->BC requires amounts of GPU memories, which is more serious. So it...

cannot run hg38_hyena_seqlen_warmup_reload

Hi, Thanks for your great work! I am trying to do `hg38/hg38_hyena_seqlen_warmup_reload.yaml` experiment. Got the following error msg: I had some initial search on this issue and found [this](https://github.com/HazyResearch/hyena-dna/issues/31). I...

[Bug] RoPE attention encounters illegal memory for long sequence decoding

### Checklist - [x] I have checked [FAQs](https://github.com/fla-org/flash-linear-attention/blob/main/FAQs.md) and existing issues for similar problems - [x] My GPU is H100 and I have installed `triton-nightly` built by fla team, and...

bug

[Bug] Throughput benchmarking script fails

bug

[RFC] Optimize autotune setting

### Feature Request Currently, FLA contains several suboptimal autotuning settings. We should avoid performing grid search over the full Cartesian product space, as it is inefficient and often unnecessary. ###...

enhancement

[RFC] Support Group Value Attention (GVA) for Gated DeltaNet

### Proposal Mamba2's GVA is useful ### Rationale _No response_

enhancement

[RFC] Implement Triton Version of Token Shift and ShortConv with varlen support

### Proposal RWKV-6 and RWKV-7 currently do not support varlen training. We aim to develop varlen token shift kernels to enable this functionality. - [x] RWKV-6 and RWKV-7 varlen training...

enhancement

[RFC] Implement FLA version of Mamba1/2 instead of using HF version

### Proposal The Hugging Face implementation lacks support for many features. It would be more convenient to integrate Mambas into the FLA ecosystem to enable functionalities like inference, varlen training,...

enhancement

[RFC] Use each model's official initialization instead of a unified initialization

### Proposal Use each model's official initialization instead of a unified initialization ### Rationale Related issue https://github.com/fla-org/flash-linear-attention/issues/220 , https://github.com/fla-org/flash-linear-attention/issues/266

enhancement

[RFC] Use tl.exp2 for all gating operations

### Proposal as title ### Rationale _No response_

enhancement