DESED_task 'synth_set' used twice

Hi, I was looking through the code for the DCASE'24 Task 4 baseline system and noticed the following lines in the file train_pretrained.py:

strong_full_set = torch.utils.data.ConcatDataset([strong_set, synth_set])
tot_train_data = [maestro_real_train, synth_set, strong_full_set, weak_set, unlabeled_set]
train_dataset = torch.utils.data.ConcatDataset(tot_train_data)

According to this, 'synth_set' is used twice. Is there a specific reason for this?

Apr 28 '24 19:04 fschmid56

Hi,

Thanks for the question, I think it has been done only to "upsample" the amount of synthetic training data during each epoch. It is very similar to having 12 for synthetic training data as in the past recipe but it has been split into 6 and 6+strong.

In general the recipe is very sensitive to the batch size and the proportions of each dataset. This is for sure not optimal but worked well in our experiments.

@JanekEbb do you know more maybe ?

Apr 29 '24 16:04 popcornell

Thanks for the explanation!

Apr 29 '24 20:04 fschmid56

Actually, I'd say that leads to strong_set (strong Audioset portion) being underrepresented in the training. Currently strong_set makes only 6/64*3470/(10000+3470)≈2.6% of the training data if I am not wrong. We may wanna fix that.

Thanks for pointing that out Florian!

Apr 29 '24 20:04 JanekEbb

After many tries it seems to me that the best configuration is this one with the strong and synth concatenated. The strong labels do not seem to help in my case.

May 09 '24 14:05 popcornell