Cuda error: out of memory with a batch size of 1 on a RTX 3090
I'm trying to use the Beta version to train but I'm getting the out of memory error no matter the setting I pick.
INFO:me-test:{'train': {'log_interval': 200, 'seed': 1234, 'epochs': 20000, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 1, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 12800, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 40000, 'filter_length': 2048, 'hop_length': 400, 'win_length': 2048, 'n_mel_channels': 125, 'mel_fmin': 0.0, 'mel_fmax': None, 'training_files': './logs/me-test/filelist.txt'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 10, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'use_spectral_norm': False, 'gin_channels': 256, 'spk_embed_dim': 109}, 'model_dir': './logs/me-test', 'experiment_dir': './logs/me-test', 'save_every_epoch': 5, 'name': 'me-test', 'total_epoch': 20, 'pretrainG': 'pretrained_v2/f0G40k.pth', 'pretrainD': 'pretrained_v2/f0D40k.pth', 'version': 'v2', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 1, 'if_latest': 1, 'save_every_weights': '1', 'if_cache_data_in_gpu': 0} INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. gin_channels: 256 self.spk_embed_dim: 109 INFO:me-test:loaded pretrained pretrained_v2/f0G40k.pth pretrained_v2/f0D40k.pth <All keys matched successfully> <All keys matched successfully> /mnt/c/users/john/documents/RVC-beta-v2-0528/venv/lib/python3.10/site-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error. Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. /mnt/c/users/john/documents/RVC-beta-v2-0528/venv/lib/python3.10/site-packages/torch/autograd/init.py:200: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance. grad.sizes() = [64, 1, 4], strides() = [4, 1, 1] bucket_view.sizes() = [64, 1, 4], strides() = [4, 4, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:323.) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass INFO:me-test:Train Epoch: 1 [0%] INFO:me-test:[0, 0.0001] INFO:me-test:loss_disc=4.307, loss_gen=3.831, loss_fm=14.001,loss_mel=20.521, loss_kl=5.731 DEBUG:matplotlib:matplotlib data path: /mnt/c/users/john/documents/RVC-beta-v2-0528/venv/lib/python3.10/site-packages/matplotlib/mpl-data DEBUG:matplotlib:CONFIGDIR=/home/se7dev/.config/matplotlib DEBUG:matplotlib:interactive is False DEBUG:matplotlib:platform is linux INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. INFO:me-test:====> Epoch: 1 [2023-06-05 01:57:52] | (0:00:49.921940) Process Process-1: Traceback (most recent call last): File "/home/se7dev/miniconda3/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/se7dev/miniconda3/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/mnt/c/users/john/documents/RVC-beta-v2-0528/train_nsf_sim_cache_sid_load_pretrain.py", line 218, in run train_and_evaluate( File "/mnt/c/users/john/documents/RVC-beta-v2-0528/train_nsf_sim_cache_sid_load_pretrain.py", line 446, in train_and_evaluate scaler.scale(loss_gen_all).backward() File "/mnt/c/users/john/documents/RVC-beta-v2-0528/venv/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/mnt/c/users/john/documents/RVC-beta-v2-0528/venv/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.
V2 is just broken and it barely works. Remove the current nvidia driver with DDU in safe mode then install the latest nvidia driver and try again.
Same here - I actually have 2 x 3090s.
V2 is just broken and it barely works. Remove the current nvidia driver with DDU in safe mode then install the latest nvidia driver and try again.
is there any easier way to fix it? i use google colab with docker and to launch it local-runtime way i'm kinda forced to update them every time i set up local-runtime.
You might want to set pin_memory to false if you still have OOM issue. https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/fad31f24f58fbbbe100dc7bcfc60f3d4e4f8a6bb/train_nsf_sim_cache_sid_load_pretrain.py#L130
acutely, it run into OOM just because this is the last piece of memory.
This issue was closed because it has been inactive for 15 days since being marked as stale.