Issue with training on my own dataset.

Open rishabhjain16 opened this issue 3 years ago • 0 comments

I have been trying to train Fragment VC model on my own dataset. It works fine with VCTK Dataset, but when I try it with my own dataset, I get the following error. Maybe it has something to do with my dataset and structure. It a non-native English that I am using as my dataset, so I want to find out if I can do VC from say librispeech to non-native English and vice versa. I get the following error and I am not quite sure how to fix it.

root@06089af1684b:/workspace/vc/FragmentVC# CUDA_VISIBLE_DEVICES=1 python train.py features_myst --s
ave_dir ./ckpts_myst --batch_size 16 --preload
100% 17163/17163 [00:18<00:00, 913.63it/s]
Train:   0% 0/1000 [00:00<?, ? step/s]Traceback (most recent call last):
  File "train.py", line 247, in <module>
    main(**parse_args())
  File "train.py", line 166, in main
    batch = next(train_iterator)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1065, in _next_data
    return self._process_data(data)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
    data.reraise()
  File "/opt/conda/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 272, in __getitem__
    return self.dataset[self.indices[idx]]
  File "/workspace/vc/FragmentVC/data/intra_speaker_dataset.py", line 73, in __getitem__
    for sampled_id in random.sample(utterance_indices, self.n_samples):
  File "/opt/conda/lib/python3.8/random.py", line 363, in sample
    raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative

Train:   0% 0/1000 [00:01<?, ? step/s]

I think it most probably have something to do with the structure of my dataset. Or something to do with length of audio files? I tried looking around but didn't find any working solutions. Any help is appreciated. Thanks in advance.

Feb 15 '22 13:02 rishabhjain16