VITS codes failed to run for Python 3.10.12
Hi, I managed to run VITS on Python 3.8.12 but when I upgraded to Python 3.10.12, I've got the following error:
Traceback (most recent call last):
File "/content/drive/MyDrive/vits/train.py", line 290, in
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/content/drive/MyDrive/vits/train.py", line 117, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/content/drive/MyDrive/vits/train.py", line 137, in train_and_evaluate
for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths) in enumerate(train_loader):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 644, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/content/drive/MyDrive/vits/data_utils.py", line 114, in call
torch.LongTensor([x[1].size(1) for x in batch]),
File "/content/drive/MyDrive/vits/data_utils.py", line 114, in
It appears Python 3.10.12 is complaining the input tensor is of wrong shape but that doesn't happen for Python 3.8
May I ask why is this happening? Thanks.
You can try running the code using Python3.7. Python3.7 works for me.
Hi there
I am facing the same issue on Google Colab. I've switched the version of python to 3.7, but I get the same error.
Traceback (most recent call last):
File "train.py", line 290, in <module>
main()
File "train.py", line 50, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 160, in join
**raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:**
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/content/vits/train.py", line 62, in run
dist.init_process_group(backend='nccl', init_method='env://', world_size=n_gpus, rank=rank)
File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 754, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 246, in _env_rendezvous_handler
store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 169, in _create_c10d_store
raise ValueError(f"port must have value from 0 to 65535 but was {port}.")
**ValueError: port must have value from 0 to 65535 but was 80000.'**`
Hi there 你好
I am facing the same issue on Google Colab. I've switched the version of python to 3.7, but I get the same error.我在 Google Colab 上遇到了同样的问题。我将 Python 版本切换到了 3.7,但仍然出现相同的错误。
Traceback (most recent call last): File "train.py", line 290, in <module> main() File "train.py", line 50, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes while not context.join(): File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 160, in join **raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:** -- Process 0 terminated with the following error: Traceback (most recent call last): File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/content/vits/train.py", line 62, in run dist.init_process_group(backend='nccl', init_method='env://', world_size=n_gpus, rank=rank) File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 754, in init_process_group store, rank, world_size = next(rendezvous_iterator) File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 246, in _env_rendezvous_handler store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout) File "/usr/local/envs/vits_tts/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 169, in _create_c10d_store raise ValueError(f"port must have value from 0 to 65535 but was {port}.") **ValueError: port must have value from 0 to 65535 but was 80000.'**`
ok,you can fix it by change 8000 to other number, like this: