Mutian He

Results 10 comments of Mutian He

Maybe the explanation of arguments for running the examples? eg. the format and specifications of train_file, word_file and eval_file, and the meaning of num_classes. Thanks.

Try to use keras1.2.2 rather than keras2

(Simply use keras1.2. The problem is due to incompatibility.)

(Or do you have any number for the performance of DUAL with the view only on each segment?)

Thank you very much! Let me have a look at them.

Thank you very much! Actually I am looking for the HuBERT units for each segment (for, e.g., context-0_0_1, context-0_0_2, ...), while it seems that the provided units above and in...

Ah, it is simply the standard GPT2 tokenizer on Huggingface transformers.

I might have also encountered this problem. From my test it happens at the 2nd transformer layer during decoding with H100 GPUs, length >= 256, and with negative `seqlen_offset` values...

I'm actually using the NSA kernel recently and hence working on fixing this...I can try to get this done in a few days BTW @Espere-1119-Song from what I understand it...

BTW there are also some places outside the kernels that involve the issue, for example https://github.com/fla-org/flash-linear-attention/blob/364c199e65b3247efec2eb4b10067152bb3a8f1a/fla/layers/utils.py#L150-L161