Mutian He

Results 6 issues of Mutian He

examples/var/dataset is gitignored, however there's certain Rakefile in that directory need by lib/dataset.rb to run the examples.

May I know if the code or more detailed training recipe (e.g. training hyperparameters) of the [paper](https://arxiv.org/abs/2104.06678) can be released? It was mentioned at #3980 but it seems that no...

question
needs triage

Hello! I'm recently doing some experiments on NMSQA, and the code for DUAL provided here are really helpful! While I encountered some difficulty building the units using scripts provided to...

Hi, I'm recently trying to run lm-eval on Pythia models using the benchmarks listed in the paper. All the benchmarks show similar results to those reported in the paper, except...

I'm trying to update the implementation of NSA including the kernels to adapt to the cached inference scenario when Tq != Tkv, so that hopefully https://github.com/fla-org/flash-linear-attention/issues/417 can be resolved. Respective...

### Feature Request It seems that the current version of native sparse attention is in lack of the capability of cached inference. When `use_cache=True` is passed with `model.generate`, a shape...

enhancement