qq31415926

Results 4 comments of qq31415926

看代码batch_sizeh只支持1,应该速度比较慢

您好,请问能解释一下bilstm_crf模型为啥val_loss需要为负数吗?

I also met problems when training Qwen3-Omni with reference to https://github.com/hiyouga/LLaMA-Factory/issues/9222

Thanks for your contributions. I'd like to ask that if the eager attention implementation could change into flash attention?