Question about the settings in speech_data_simulator
Hi, I'm currently using NeMo/tools/speech_data_simulator to fine-tune the MSDD model and have some questions about the data_simulator.
1. How can I ensure that every session has exactly as many speakers as num_speakers?
Currently in my case, sessions are occasionally created that contain fewer speakers than num_speakers.
This seemed to become more frequent as num_speakers became larger than 4.
For example, I've created 32 sessions with num_speakers as 4, but 9 sessions include only 3 speakers.
I used a custom dataset as an input to this simulator, and the total number of speakers in the dataset was around 50.
The minimum number of utterances from speakers was 300, and the average length of an utterance was about 5 seconds.
As far as I've looked up, the following parameters are related with the above question:
https://github.com/NVIDIA/NeMo/blob/0e744c9300ca99060696b3536978ff5629312071/tools/speech_data_simulator/conf/data_simulator.yaml#L8-L9
https://github.com/NVIDIA/NeMo/blob/0e744c9300ca99060696b3536978ff5629312071/tools/speech_data_simulator/conf/data_simulator.yaml#L79-L83
https://github.com/NVIDIA/NeMo/blob/0e744c9300ca99060696b3536978ff5629312071/tools/speech_data_simulator/conf/data_simulator.yaml#L18-L21
I tried tweaking the settings to fix this, but nothing worked.
My current setup is as follows:
config.data_simulator.session_config.num_speakers = # This setting varies from 2 to 6
config.data_simulator.session_config.session_length = # This setting varies from 10min to 40min
config.data_simulator.session_params.min_dominance = 1 / (num_speakers + 1)
config.data_simulator.session_params.mean_silence = 0.08
config.data_simulator.session_params.turn_prob=0.875
config.data_simulator.session_params.min_turn_prob=0.875
config.data_simulator.speaker_enforcement.enforce_num_speakers = True
config.data_simulator.speaker_enforcement.enforce_time = {0: 1.0, 1: 1.0} # I've tried {0: 0.75, 1: 1.0}, {0: 0.99, 1: 1.0}, too
2. Why the default value of sentence_length_params is not an integer?
According to the comments, the value of sentence_length_params must be a positive integer but the value is set to 0.4.
The session itself creates fine with this setting, but I'd like to ask why this is the default.
https://github.com/NVIDIA/NeMo/blob/0f2874b270f476405f11aeb09d38a709118c67b5/tools/speech_data_simulator/conf/data_simulator.yaml#L15-L17
Thank you in advance.
Just in case, I've been using the latest version of NeMo with:
apt-get update && apt-get install -y libsndfile1 ffmpeg
git clone https://github.com/NVIDIA/NeMo
cd NeMo
./reinstall.sh
@tango4j, could you check the above issue with num_speakers and sentence_length_params?
- How can I ensure that every session has exactly as many speakers as num_speakers?
We need a little more time to figure out why enforce_num_speakers: true is not working as expected for 10mins sessions and more than 4 speakers. @tango4j we have a primitive fix in mind but need to further test it
- Why the default value of sentence_length_params is not an integer?
You're right that the k in sentence_length_params should usually be an integer. We use a default 0.4 in order to match the segment length distribution in AMI dataset, but you can set it to other integer values, and that would generally increase the lengths of segments
We need a little more time to figure out why
enforce_num_speakers: trueis not working as expected for 10mins sessions and more than 4 speakers.
@stevehuang52 Thank you for figuring out this issue. You can check these dataset I used and the simulated meetings I've generated in case it helps: [dataset(4.0GB)] [alignments in simple,condensed format(2MB)] [generated sim_meet over 2~6 speakers(4.3GB)]
You're right that the
kinsentence_length_paramsshould usually be an integer. We use a default 0.4 in order to match the segment length distribution in AMI dataset, but you can set it to other integer values, and that would generally increase the lengths of segments
Thanks a lot. Then I'll set it to an integer in my case.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
(Just a bump)
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.