[HIFIGAN] How to train a model of 44100 sampling rate?

Open godspirit00 opened this issue 2 years ago • 1 comments

I tried to set the related arguments in train.py as --sampling_rate 44100 --filter_length 2048 --hop_length 512 --win_length 2048, but got the following error:

train.py:412: UserWarning: Using a target size (torch.Size([24, 80, 8])) that is different to the input size (torch.Size([24, 80, 16])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
Traceback (most recent call last):
  File "train.py", line 507, in <module>
    main()
  File "train.py", line 412, in main
    loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 3230, in l1_loss
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/functional.py", line 75, in broadcast_tensors
    return _VF.broadcast_tensors(tensors)  # type: ignore[attr-defined]
RuntimeError: The size of tensor a (16) must match the size of tensor b (8) at non-singleton dimension 2

So how can I train a model of 44100 sampling rate? Thank you.

Mar 21 '23 03:03 godspirit00

You need to set the sampling rate also when creating the Mel spectrogram features from raw audio. These were probably created using sampling rate of 22050Hz so you get a factor of 2 in the number of windows (8 vs 16, as seen in the error message)

Apr 18 '23 17:04 itamar-dw