qq31415926 comments

Results 4 comments of


                                            qq31415926

看代码batch_sizeh只支持1，应该速度比较慢

您好，请问能解释一下bilstm_crf模型为啥val_loss需要为负数吗？

I also met problems when training Qwen3-Omni with reference to https://github.com/hiyouga/LLaMA-Factory/issues/9222

Thanks for your contributions. I'd like to ask that if the eager attention implementation could change into flash attention?