huyanxin
huyanxin
OK, in our practice, we remove the (i)stft and atan part from the onnx part. Or you can use libtorch, this is also a good way for inference.
No, when training dccrn, the targets are reverberant clean speech. If you want to remove both noise and reverb, using early-reverb clean speech like other teams may be better.
请问你这边数据用的是什么呢?如果你把4s每段去除,即把https://github.com/huyanxin/phasen/blob/31ee2f1ba89b535142a5189abf913e9ac7f36404/steps/run_phasen.py#L196 置成false后会是什么情况呢?最好能发下样例我听听看吧。正常来说不应该出现这种情况才对,本身分段也是做了overlap来保证前后段之间的连续性,按理说不应该出现这种情况才对,样例你发我邮箱[email protected] 吧
我印象中之前也遇到过Nan,直观感觉是这个loss确实有点问题,和你的数据集和网络初始化有点关系,曾经试过只喂随机数,前几个step正常,然后就突然崩掉了确实也很奇怪。你换成sisnr应该就可以了正常训练,效果也是相差不大的。
You can change `outputs, wav = data_parallel(model, (inputs, ))` to `outputs, wav = model(inputs )`
上次训应该是半年多前了。。有点记不得了。。你可以这样,看下oracle的irm或者cirm在这批数据上的sisnr(我记得我有加这个metric),然后可以估计一下,我之前在用WSJ0训的模型好像cv集的sisnr在17db左右,当然你还要考虑一个问题,就是aishell-1本身数据比较脏,所以学出来可能会有一定的偏差
I think nan caused by other part and The grad clip had been already added in https://github.com/huyanxin/phasen/blob/31ee2f1ba89b535142a5189abf913e9ac7f36404/steps/run_phasen.py#L76
I add eps in mixed loss, you can try it https://github.com/huyanxin/phasen/commit/929fa464e610b73c04fd85f5a4dcb77ec040fc30
Thansk! And I fixed it in https://github.com/huyanxin/phasen/commit/929fa464e610b73c04fd85f5a4dcb77ec040fc30
Thansk! And I fixed it in https://github.com/huyanxin/phasen/commit/929fa464e610b73c04fd85f5a4dcb77ec040fc30