DeepLearningExamples icon indicating copy to clipboard operation
DeepLearningExamples copied to clipboard

[Transformer-XL/PyTorch]RuntimeError : CUDA error : CUBLAS_STATUS_NOT_INITIALIZED when calling 'cublasCreate(handle)'

Open gieflij opened this issue 3 years ago • 0 comments

Related to Transformer-XL/PyTorch

Describe the bug I got an runtime Error : CUDA error : CUBLAS_STATUS_NOT_INITIALIZED when calling 'cublasCreate(handle)'

Run evaluation...
0: thread affinity: {0}
Experiment dir : LM-TFM
Namespace(affinity='single_unique', batch_size=16, clamp_len=400, cuda=True, data='../data/wikitext-103/', dataset='wt103', debug=False, dllog_file='eval_log.json', ext_len=0, fp16=False, load_torchscript=None, local_rank=0, log_all_ranks=False, log_interval=10, manual=None, manual_config=None, manual_vocab='word', max_size=None, mem_len=640, model='', no_env=False, percentiles=[90, 95, 99], repeat=1, same_length=True, save_data=False, save_torchscript=None, seed=1111, split='test', target_perplexity=None, target_throughput=None, tgt_len=64, type='pytorch', work_dir='LM-TFM')
Collecting environment information...
PyTorch version: 1.6.0a0+9907a3e
Is debug build: No
CUDA used to build PyTorch: 11.0

OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: version 3.14.0

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce GTX 1650
Nvidia driver version: 460.91.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.1

Versions of relevant libraries:
[pip] msgpack-numpy==0.4.3.2
[pip] numpy==1.18.1
[pip] pytorch-transformers==1.1.0
[pip] torch==1.6.0a0+9907a3e
[pip] torchtext==0.6.0
[pip] torchvision==0.7.0a0
[conda] magma-cuda110             2.5.2                         5    local
[conda] mkl                       2019.1                      144  
[conda] mkl-include               2019.1                      144  
[conda] msgpack-numpy             0.4.3.2                  py36_0  
[conda] nomkl                     3.0                           0  
[conda] numpy                     1.18.1           py36h94c655d_0  
[conda] numpy-base                1.18.1           py36h2f8d375_1  
[conda] pytorch-transformers      1.1.0                    pypi_0    pypi
[conda] torch                     1.6.0a0+9907a3e          pypi_0    pypi
[conda] torchtext                 0.6.0                    pypi_0    pypi
[conda] torchvision               0.7.0a0                  pypi_0    pypi
Loading checkpoint from LM-TFM/checkpoint_best.pt
Loading cached dataset...
Evaluating with: math fp32 type pytorch bsz 16 tgt_len 64 ext_len 0 mem_len 640 clamp_len 400
Traceback (most recent call last):
  File "eval.py", line 515, in <module>
    main()
  File "eval.py", line 456, in main
    loss = evaluate(iter, model, meters, args.log_interval, args.max_size, args.repeat)
  File "eval.py", line 194, in evaluate
    loss, mems = model(data, target, mems)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 577, in __call__
    result = self.forward(*input, **kwargs)
  File "/workspace/transformer-xl/pytorch/mem_transformer.py", line 788, in forward
    hidden, new_mems = self._forward(data, mems=mems)
  File "/workspace/transformer-xl/pytorch/mem_transformer.py", line 711, in _forward
    pos_emb = self.pos_emb(pos_seq)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 577, in __call__
    result = self.forward(*input, **kwargs)
  File "/workspace/transformer-xl/pytorch/mem_transformer.py", line 39, in forward
    sinusoid_inp = torch.ger(pos_seq, self.inv_freq)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', 'eval.py', '--local_rank=0', '--config_file', 'wt103_base.yaml', '--type', 'pytorch']' returned non-zero exit status 1.

To Reproduce I followed Quick Start Guide 1 to 4, and downloaded checkpoint for "Transformer-XL PyTorch checkpoint (base, amp)" from nvidia ngc.

  1. git clone https://github.com/NVIDIA/DeepLearningExamples
  2. cd DeepLearningExamples/PyTorch/LanguageModeling/Transformer-XL
  3. bash getdata.sh
  4. bash pytorch/scripts/docker/build.sh
  5. bash pytorch/scripts/docker/interactive.sh
  6. wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/transformerxl_pyt_ckpt_base_amp/versions/19.11.0/zip -O transformerxl_pyt_ckpt_base_amp_19.11.0.zip
  7. unzip transformerxl_pyt_ckpt_base_amp_19.11.0.zip
  8. bash run_wt103_base.sh eval 1 --type pytorch --model checkpoint_best.pt

Expected behavior I expected to get "test loss" and "test ppl" as an examples

Environment Please provide at least:

  • Container version (e.g. pytorch:19.05-py3): transformer-xl:latest
  • GPUs in the system: (e.g. 8x Tesla V100-SXM2-16GB): GeForce GTX 1650
  • CUDA driver version (e.g. 418.67): 460.91.3

gieflij avatar Jul 06 '22 02:07 gieflij