Memory allocation failed
I tried to train with 2 GPUs by docker, but after one epoch, memory errors in allocation occur. I am not sure what to check and what's wrong possibly.

Thank you for creating, please let me confirm basics at first.
- What GPU do you use? It would be helpful if you could provide the result of
nvidia-smi. - Is there any logs such as
openmpi library is not foundbefore start training? - Do you get same error message even if you use smaller batch size? Does this error always appear on 2nd epoch?
hello, why the g_loss_con is 0.0000 (0.0000) all time ???
@TomonobuTsujikawa @ppphhhleo
Thank you for reporting, please let us check it.
I have 11GB memory GPU, so I tried to run NVCNet on this environment. At first, I couldn't run this model due to memory allocation error, so I had to reduce batch_size to 2. After that, training is started correctly, but g_loss_con is 0 as you pointed.
Now, I'm confirming about g_loss_con.
@TomonobuTsujikawa
please, i want to train for multi-GPU,but meet the question:
(tts_nnabla) twu@durian:/qwork4/twu/off_nvcnet$ mpirun -n 2 python main.py -c cudnn -d 0,2 --output_path log_new/baseline --batch_size 8
2022-08-24 16:29:11,963 [nnabla][INFO]: Initializing CPU extension...
2022-08-24 16:29:11,971 [nnabla][INFO]: Initializing CPU extension...
2022-08-24 16:29:12,607 [nnabla][INFO]: Initializing CUDA extension...
2022-08-24 16:29:12,607 [nnabla][INFO]: Initializing CUDA extension...
2022-08-24 16:29:25,542 [nnabla][INFO]: Initializing cuDNN extension...
value error in query
/home/gitlab-runner/builds/LRsSYq-B/0/nnabla/builders/all/nnabla/include/nbla/function_registry.hpp:70
Failed it != items_.end(): Any of [cudnn:float, cuda:float, cpu:float] could not be found in []
No communicator found. Running with a single process. If you run this with MPI processes, all processes will perform totally same.
2022-08-24 16:29:25,558 [nnabla][INFO]: Initializing cuDNN extension...
value error in query
/home/gitlab-runner/builds/LRsSYq-B/0/nnabla/builders/all/nnabla/include/nbla/function_registry.hpp:70
Failed it != items_.end(): Any of [cudnn:float, cuda:float, cpu:float] could not be found in []
No communicator found. Running with a single process. If you run this with MPI processes, all processes will perform totally same.
2022-08-24 16:29:26,010 [nnabla][INFO]: Training data with 103 speakers.
2022-08-24 16:29:26,011 [nnabla][INFO]: DataSource with shuffle(True)
2022-08-24 16:29:26,015 [nnabla][INFO]: Training data with 103 speakers.
2022-08-24 16:29:26,016 [nnabla][INFO]: DataSource with shuffle(True)
2022-08-24 16:29:26,025 [nnabla][INFO]: Using DataIterator
2022-08-24 16:29:26,030 [nnabla][INFO]: Using DataIterator
Running epoch=1 lr=0.00010
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate again.
Error during forward propagation:
RandintCuda
MulScalarCuda
AddScalarCuda
Mul2Cuda
RandCuda
Mul2Cuda
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
ArangeCuda
ReshapeCuda
StackCuda
GatherNdCuda
Constant
SigmoidCrossEntropyCuda
MeanCudaCudnn
AddScalarCuda
AveragePoolingCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
ArangeCuda
ReshapeCuda
StackCuda
GatherNdCuda
Constant
SigmoidCrossEntropyCuda
MeanCudaCudnn
Add2CudaCudnn
AveragePoolingCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
ArangeCuda
ReshapeCuda
StackCuda
GatherNdCuda
Constant
SigmoidCrossEntropyCuda
MeanCudaCudnn
Add2CudaCudnn
RandintCuda
MulScalarCuda
AddScalarCuda
Mul2Cuda
RandCuda
Mul2Cuda
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
GELUCuda
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
GELUCuda
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
PowScalarCuda
AddScalarCuda
SumCuda
PowScalarCuda
Div2Cuda
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
GELUCuda
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
GELUCuda
WeightNormalizationCuda
DeconvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
RandintCuda
MulScalarCuda
AddScalarCuda
Mul2Cuda
RandCuda
Mul2Cuda
PadCuda
ConvolutionCudaCudnn
PowScalarCuda
ConvolutionCudaCudnn
PowScalarCuda
Add2CudaCudnn
PowScalarCuda
BatchMatmulCuda
MulScalarCuda
AddScalarCuda
LogCuda
Callback
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
MulScalarCuda
ExpCuda
RandnCuda
Mul2Cuda
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
DeconvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
DeconvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
DeconvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn <-- ERROR
Traceback (most recent call last):
File "main.py", line 99, in this->alloc_impl(): N4nbla10CudaMemoryE allocation failed.
and I test the environment about:python -c "import nnabla_ext.cuda, nnabla_ext.cudnn" :

Please provide me the results of following command:
pip list | grep -e pip -e nnabla
You can import nnabla correctly on single GPU environment, so I think it is a setup issue for multi GPUs.
@TomonobuTsujikawa
the result:

Hmm, it seems to be ok.
Do you still have same error if you do the following?
pip uninstall nnabla nnabla-ext-cuda110-nccl2-mpi3-1-6
pip install nnabla nnabla-ext-cuda110-nccl2-mpi3-1-6
mpirun -n 2 python main.py -c cudnn -d 0,1 --output_path log_new/baseline --batch_size 8
I will also check.
@TomonobuTsujikawa
it still have same error:
(tts_nnabla) twu@durian:/qwork4/twu/nvcnet_offi$ mpirun -n 2 python main.py -c cudnn -d 0,1 --output_path log_new/baseline --batch_size 8
2022-08-29 17:52:27,939 [nnabla][INFO]: Initializing CPU extension...
2022-08-29 17:52:27,939 [nnabla][INFO]: Initializing CPU extension...
2022-08-29 17:52:30,726 [nnabla][INFO]: Initializing CUDA extension...
2022-08-29 17:52:30,727 [nnabla][INFO]: Initializing CUDA extension...
/qwork4/twu/miniconda/envs/tts_nnabla/bin/../lib/libmpi.so: undefined symbol: ompi_mpi_op_no_op
/qwork4/twu/miniconda/envs/tts_nnabla/bin/../lib/libmpi.so: undefined symbol: ompi_mpi_op_no_op
2022-08-29 17:52:43,731 [nnabla][INFO]: Initializing cuDNN extension...
2022-08-29 17:52:44,080 [nnabla][INFO]: Training data with 103 speakers.
2022-08-29 17:52:44,081 [nnabla][INFO]: DataSource with shuffle(True)
2022-08-29 17:52:44,100 [nnabla][INFO]: Using DataIterator
2022-08-29 17:52:44,716 [nnabla][INFO]: Initializing cuDNN extension...
2022-08-29 17:52:45,076 [nnabla][INFO]: Training data with 103 speakers.
2022-08-29 17:52:45,076 [nnabla][INFO]: DataSource with shuffle(True)
2022-08-29 17:52:45,103 [nnabla][INFO]: Using DataIterator
value error in query
/home/gitlab-runner/builds/LRsSYq-B/0/nnabla/builders/all/nnabla/include/nbla/function_registry.hpp:70
Failed it != items_.end(): Any of [cudnn:float, cuda:float, cpu:float] could not be found in []
No communicator found. Running with a single process. If you run this with MPI processes, all processes will perform totally same.
Running epoch=1 lr=0.00010
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate again.
Error during forward propagation:
RandintCuda
MulScalarCuda
AddScalarCuda
Mul2Cuda
RandCuda
Mul2Cuda
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
ArangeCuda
ReshapeCuda
StackCuda
GatherNdCuda
Constant
SigmoidCrossEntropyCuda
MeanCudaCudnn
AddScalarCuda
AveragePoolingCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
ArangeCuda
ReshapeCuda
StackCuda
GatherNdCuda
Constant
SigmoidCrossEntropyCuda
MeanCudaCudnn
Add2CudaCudnn
AveragePoolingCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
ArangeCuda
ReshapeCuda
StackCuda
GatherNdCuda
Constant
SigmoidCrossEntropyCuda
MeanCudaCudnn
Add2CudaCudnn
RandintCuda
MulScalarCuda
AddScalarCuda
Mul2Cuda
RandCuda
Mul2Cuda
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
GELUCuda
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
GELUCuda
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
PowScalarCuda
AddScalarCuda
SumCuda
PowScalarCuda
Div2Cuda
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
GELUCuda
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
GELUCuda
WeightNormalizationCuda
DeconvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
RandintCuda
MulScalarCuda
AddScalarCuda
Mul2Cuda
RandCuda
Mul2Cuda
PadCuda
ConvolutionCudaCudnn
PowScalarCuda
ConvolutionCudaCudnn
PowScalarCuda
Add2CudaCudnn
PowScalarCuda
BatchMatmulCuda
MulScalarCuda
AddScalarCuda
LogCuda
Callback
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
AveragePoolingCudaCudnn
LeakyReLUCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
MulScalarCuda
ExpCuda
RandnCuda
Mul2Cuda
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
DeconvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
DeconvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
GELUCuda
WeightNormalizationCuda
DeconvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
PadCuda
WeightNormalizationCuda
ConvolutionCudaCudnn
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn
SliceCuda
TanhCudaCudnn
SliceCuda
SigmoidCudaCudnn
Mul2Cuda
WeightNormalizationCuda
ConvolutionCudaCudnn
Add2CudaCudnn <-- ERROR
Traceback (most recent call last):
File "main.py", line 99, in this->alloc_impl(): N4nbla10CudaMemoryE allocation failed.
@15755841658 Thank you for testing.
I setup many environments to reproduce this error today, but I could not reproduce. Can you provide your environment information a bit more? If you would like to run nvcnet on docker, please show me the Dockerfile. The following log will be very big, so I would appreciate it if you could put compressed log.
cat /etc/os-release
dpkg -l | grep ^ii
conda --version
conda list
pip --version
pip list
nvidia-smi
set | grep -e LD_LIBRARY -e LD_PRELOAD
find /usr -name libmpi.so\*
I think this is the minimum command if issue has been resolved.
mpirun -n 2 python -c "import nnabla_ext.cudnn; from nnabla.ext_utils import get_extension_context; import nnabla.communicators as C; ctx = get_extension_context('cudnn', device_id='0'); C.MultiProcessDataParallelCommunicator(ctx)"
ok! I will test.
but test the minimum command:

Yes, your environment has issue, so minimum command fails. Please provide information what I wrote.
@TomonobuTsujikawa OK,Thanks. I have emailed you. Please read the attachment for the results of these commands. And please tell me what's wrong at the back. Thanks very much.
@15755841658
I checked your environment information, here is the list of problems need to be solved.
- Ubuntu16: Official support is ubuntu18 and later. This is because the many packages are really old on ubuntu16.
- openmpi1: openmpi1 is not supported. I recommend to use openmpi v3 as of now (you can still use openmpi v2).
- ~~pip: If you use conda environment, pip must use conda's pip, otherwise python package management will conflict.~~ Your pip seems to be conda-base. I'm sorry.
I cannot find nvidia driver/cuda/cudnn packages in your dpkg list, but you installed them by manually?
Also, there seems to be a new mpi on /usr/local, but you cannot use it due to permission denied.
Hmm, if you cannot upgrade OS environment, I think it is better to use docker container. Here is example:
docker pull nnabla/nnabla-ext-cuda-multi-gpu:py37-cuda110-mpi3.1.6-v1.29.0
docker run --rm -it -u $(id -u):$(id -g) --gpus all nnabla/nnabla-ext-cuda-multi-gpu:py37-cuda110-mpi3.1.6-v1.29.0
mpirun -n 2 python3 -c "import nnabla_ext.cudnn; from nnabla.ext_utils import get_extension_context; import nnabla.communicators as C; ctx = get_extension_context('cudnn', device_id='0'); C.MultiProcessDataParallelCommunicator(ctx)"
If you cannot install docker, you need to build openmpi by yourself. This is how to build openmpi, but some setup might be different since OS versions are different: https://github.com/sony/nnabla-ext-cuda/blob/v1.29.0/docker/release/Dockerfile.cuda-mpi#L54-L86
Also, please refer nnabla install page: https://nnabla.org/install/ This page provides the list of install components and how to install.
I had same trouble when i tied to setup another code repo, env:
-
numpy==1.22.4 - docker with
cuda 11.6 - os: ubuntu-18.04
after i install numpy>=1.23.0, the problem is fixed. however, some warnings showed up, such as:
...
2022-11-10 11:54:56,668 [nnabla][INFO]: Initializing CUDA extension...
<frozen importlib._bootstrap>:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 232 from PyObject
...
hope helpful.