Transformer-TTS icon indicating copy to clipboard operation
Transformer-TTS copied to clipboard

Question about prepare alignments

Open chynphh opened this issue 5 years ago • 1 comments

Hi, In prepare_fastspeech.ipynb file, about

F = torch.mean(torch.max(alignments, dim=-1)[0], dim=-1) 
r, c = torch.argmax(F).item()//4, torch.argmax(F).item()%4
location = torch.max(alignments[r,c], dim=1)[1]

My understanding is: In the first line, the tensor shape changed from (layer_num, target_length, source_length) to (layer_num, target_length), and to (layer_num). But I don't understand what's the mean of "4", and why use the layer num to calculate the location?

If there is a problem with my understanding, thanks for pointing out.

chynphh avatar Jun 15 '20 15:06 chynphh

Hello, @chynphh

4 means the number of heads used to multihead-attention. If you edit the return value of multihead attention in pytorch, you can get the attention with (layer_num, head_num, target_length, source_length) shape.

Consequently, r and c means n_layers and n_heads. Hope that this comment be helpful to you.

Sincerely,

Jackson-Kang avatar Aug 20 '20 12:08 Jackson-Kang