DeepLearningExamples
DeepLearningExamples copied to clipboard
[HIFIGAN] How to train a model of 44100 sampling rate?
I tried to set the related arguments in train.py as --sampling_rate 44100 --filter_length 2048 --hop_length 512 --win_length 2048, but got the following error:
train.py:412: UserWarning: Using a target size (torch.Size([24, 80, 8])) that is different to the input size (torch.Size([24, 80, 16])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
Traceback (most recent call last):
File "train.py", line 507, in <module>
main()
File "train.py", line 412, in main
loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 3230, in l1_loss
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
File "/root/miniconda3/lib/python3.8/site-packages/torch/functional.py", line 75, in broadcast_tensors
return _VF.broadcast_tensors(tensors) # type: ignore[attr-defined]
RuntimeError: The size of tensor a (16) must match the size of tensor b (8) at non-singleton dimension 2
So how can I train a model of 44100 sampling rate? Thank you.
You need to set the sampling rate also when creating the Mel spectrogram features from raw audio. These were probably created using sampling rate of 22050Hz so you get a factor of 2 in the number of windows (8 vs 16, as seen in the error message)