Wangyou Zhang comments

Results 62 comments of


                                            Wangyou Zhang

TSE with Librimix: mismatch in number of speakers

@AntoineBlanot Could you paste the content of `run.sh` and the model config file (.yaml) you used?

TSE with Librimix: mismatch in number of speakers

Thank you! I think the error is caused by the default value of the argument `load_all_speakers` (=false) and in [TSEPreprocessor](https://github.com/espnet/espnet/blob/master/espnet2/train/preprocessor.py#L1657). So it will only prepare one reference signal (corresponding to...

Can more tse (target speaker extraction) models be implemented?

Thanks for your suggestion! I also considered adding SpEx series in the model pool. The major concern was indeed the multi-task training in SpEx models. It is not easy to...

Can more tse (target speaker extraction) models be implemented?

Yes, I have two versions of TD-SpeakerBeam, ① one basically following the recipe's default configuration (only one modification in enroll_segment, set to 32000), and ② another using speaker embeddings instead...

Can more tse (target speaker extraction) models be implemented?

> Exploring Time-Frequency Domain Target Speaker Extraction for Causal and Non-Causal Processing Is this the name of your new article? Yes, it is the paper that contains the results I...

Can more tse (target speaker extraction) models be implemented?

I guess the data might have some mismatch from my setup. Could you share more details about your data? For example, the sampling rates of audios in `wav.scp`, `spk1.scp`, `enroll_spk1.scp`...

Can more tse (target speaker extraction) models be implemented?

Can you also share the information about your python environment? **Basic environments:** - OS information: [e.g., Linux 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64] - python version: [e.g. 3.7.3 (default,...

Can more tse (target speaker extraction) models be implemented?

Another thing you may try is running stage 8 in `run.sh` to obtain the metrics for the mixture data (`dump/raw/RESULTS.md`) so that we can check if they match the numbers...

Can more tse (target speaker extraction) models be implemented?

> > Another thing you may try is running stage 8 in `run.sh` to obtain the metrics for the mixture data (`dump/raw/RESULTS.md`) so that we can check if they match...

Can more tse (target speaker extraction) models be implemented?

I will take a closer look at the recipe and your setup. I will try to see if I reproduce your problem.