qq31415926
Results
4
comments of
qq31415926
看代码batch_sizeh只支持1,应该速度比较慢
您好,请问能解释一下bilstm_crf模型为啥val_loss需要为负数吗?
I also met problems when training Qwen3-Omni with reference to https://github.com/hiyouga/LLaMA-Factory/issues/9222
Thanks for your contributions. I'd like to ask that if the eager attention implementation could change into flash attention?