flowtron icon indicating copy to clipboard operation
flowtron copied to clipboard

Speaker id argument

Open ilnmtlbnm opened this issue 5 years ago • 8 comments

There is a Speaker id argument in inference.py : parser.add_argument('-i', '--id', help='Speaker id', type=int).

Whenever I try to change it to something other than 0, I get the following error :

Traceback (most recent call last):
  File "inference.py", line 122, in <module>
    args.n_frames, args.sigma, args.seed)
  File "inference.py", line 63, in infer
    speaker_vecs = trainset.get_speaker_id(speaker_id).cuda()
  File "/data/code/flowtron/data.py", line 83, in get_speaker_id
    return torch.LongTensor([self.speaker_ids[int(speaker_id)]])
KeyError: 2

ilnmtlbnm avatar May 15 '20 12:05 ilnmtlbnm

If you are using the LJS model that might be expected as it is a single speaker model. You could try using the LibrITTS.

karkirowle avatar May 15 '20 12:05 karkirowle

Just a note - when using LibrITTS you will also have to change the n_speakers parameter in config.json to 123:

"model_config": { "n_speakers": 123, "n_speaker_dim": 128, "n_text": 185, "n_text_dim": 512, "n_flows": 2, "n_mel_channels": 80, "n_attn_channels": 640, "n_hidden": 1024, "n_lstm_layers": 2, "mel_encoder_n_hidden": 512, "n_components": 0, "mean_scale": 0.0, "fixed_gaussian": true, "dummy_speaker_embedding": false, "use_gate_layer": true }

Quasimondo avatar May 15 '20 13:05 Quasimondo

If you are using the LJS model that might be expected as it is a single speaker model. You could try using the LibrITTS.

image

Of course, thanks @karkirowle ! And thanks @Quasimondo for precising n_speakers for LibrITTS.

ilnmtlbnm avatar May 15 '20 13:05 ilnmtlbnm

DOH! again, I closed to fast, still doesn't with LibrITTS.

python inference.py -c config.json -f models/flowtron_libritts.pt -w models/waveglow_256channels_v4.pt -t "But the machine only creates what humans have taught it to " -i 15 -n 777 -s 0.5

ilnmtlbnm avatar May 15 '20 13:05 ilnmtlbnm

Yeah - I realized that you will also have to adjust the "data_config" section: "training_files": "filelists/libritts_train_clean_100_audiopath_text_sid_shorterthan10s_atleast5min_train_filelist.txt"

And lastly you will have to pick a speaker ID that actually exists. They are not numbered consecutively, but you have to look them up in that filelist (it's the numbers at the end of each line)

Quasimondo avatar May 15 '20 13:05 Quasimondo

Thanks again @Quasimondo

For reference, here are the valid ids for LibriTTS :

40 78 83 87 118 125 196 200 250 254 374 405 446 460 587 669 696 730 831 887 1069 1088 1116 1246 1263
 1502 1578 1841 1867 1963 1970 2092 2136 2182 2196 2289 2416 2436 2836 2843 2911 2952 3240 3242 3259
 3436 3486 3526 3664 3857 3879 3982 3983 4018 4051 4088 4160 4195 4267 4297 4362 4397 4406 4640 4680
 4788 5022 5104 5322 5339 5393 5652 5678 5703 5750 5808 6019 6064 6078 6081 6147 6181 6209 6272 6367
 6385 6415 6437 6454 6476 6529 6818 6836 6848 7059 7067 7078 7178 7190 7226 7278 7302 7367 7402 7447
 7505 7511 7794 7800 8051 8088 8098 8108 8123 8238 8312 8324 8419 8468 8609 8629 8770 8838

ilnmtlbnm avatar May 15 '20 14:05 ilnmtlbnm

Thank you for compiling this list!

rafaelvalle avatar May 15 '20 15:05 rafaelvalle

I add additional script extract available sid. See below

https://github.com/yhgon/flowtron/blob/master/inference_colab.ipynb

import os
import sys

import pandas as pd 
import numpy as np 
import random
from itertools import cycle
from data import  load_filepaths_and_text

!cat /content/flowtron/filelists/libritts_speakerinfo.txt | tail -n +12  | head -n 10

filelist_path = "/content/flowtron/filelists/libritts_train_clean_100_audiopath_text_sid_shorterthan10s_atleast5min_train_filelist.txt"

def create_speaker_lookup_table(audiopaths_and_text):
    speaker_ids = np.sort(np.unique([x[2] for x in audiopaths_and_text]))
    d = {int(speaker_ids[i]): i for i in range(len(speaker_ids))}
    print("Number of speakers :", len(d))
    return d

audiopaths_and_text = load_filepaths_and_text(filelist_path)
speaker_ids  = create_speaker_lookup_table(audiopaths_and_text).keys() 
print(speaker_ids)
speakers = pd.read_csv('/content/flowtron/filelists/libritts_speakerinfo.txt', engine='python',header=None, comment=';', sep=' *\| *',  names=['ID', 'SEX', 'SUBSET', 'MINUTES', 'NAME'])
speakers['FLOWTRON_ID'] = speakers['ID'].apply(lambda x: x if x in speaker_ids else -1)

female_speakers =   speakers.query("SEX == 'F' and MINUTES > 20 and FLOWTRON_ID >= 0")['FLOWTRON_ID'].sample(frac=1).tolist() 
male_speakers   =   speakers.query("SEX == 'M' and MINUTES > 20 and FLOWTRON_ID >= 0")['FLOWTRON_ID'].sample(frac=1).tolist() 

print("females speakers : ", len(female_speakers), female_speakers )
print("male speakers    : ", len(male_speakers), male_speakers )

yhgon avatar May 19 '20 04:05 yhgon