How to use the GMM in inference?
My assumption was that I would generate a mel embedding using the MelEncoder and then pass the embedding to the GMM to produce a residual which can then be used to synthesize a waveform. However, I didn't see an implemented way of doing this in flowtron.py.
I created an inference function in the GaussianMixture class which accepts an average mel embedding (produced by the forward function of the MelEncoder class, rather than the infer function) and used repeat() it to match the size of the mel embedding to the maximum number of frames needed. I thought my original intuition was correct, but my model is producing silent samples with zero magnitude so clearly my thinking is off.
Perhaps my intuition about how this is supposed to work is incorrect. I hope that comes through in my code below. Any guidance to how I can get to the point where I can tinker with individual GM components would be greatly appreciated!
def generate_residual(self, mel_embedding, n_frames, sigma_offsets):
mean, var, prob = self.forward(mel_embedding.unsqueeze(0), 1)
mean, var, prob = mean.squeeze(), var.squeeze(), prob.squeeze()
noise = torch.randn(self.n_mel_channels, n_frames).cuda()
residual = []
for c in range(self.n_components):
c_m = mean[:, c].repeat(n_frames, 1).T
c_v = var[:, c].repeat(n_frames, 1).T
r = c_m + noise * c_v
residual.append(r)
residual = torch.stack(residual)
residual = residual.permute(1, 2, 0)
residual = (residual * prob).sum(2)
return residual.unsqueeze(0)
@cereballo did You find the answer to this question?
What should I use instead of mel for MelEncoder in inference?
@rafaelvalle can You clarify, please?