Hui Wang

Results 8 comments of Hui Wang

谢谢回复,非常棒的工作! Active speaker detection是当前连接speaker-related audio cues和visual cues的重要渠道之一。我能否将训练代码merge到[3D-Speaker](https://github.com/alibaba-damo-academy/3D-Speaker)中。

There are several limitations to speaker recognition currently in the pipeline. It may not perform well when the audio duration is too short (less than 60 seconds) or when the...

Before https://github.com/modelscope/3D-Speaker/blob/ab22112d280fed17839094da3874813c9eb63460/speakerlab/models/campplus/layers.py#L109, seg.shape[-1] is more than or equal to x.shape[-1]. Therefore, https://github.com/modelscope/3D-Speaker/blob/ab22112d280fed17839094da3874813c9eb63460/speakerlab/models/campplus/layers.py#L109 make seg and x the same shape.

@Juelianqvq there is no chance for seg.shape[-1] < x.shape[-1] based on the code https://github.com/modelscope/3D-Speaker/blob/ab22112d280fed17839094da3874813c9eb63460/speakerlab/models/campplus/layers.py#L108

Thanks for spotting this bottleneck and proposing an optimization! We’ll verify the speedup and ensure numerical consistency with the original implementation.

是指large margin fine tuning,在speaker verification训练中一种比较常用提点方法