opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

[Bug] Running demo failed with `"LayerNormKernelImpl" not implemented for 'Half'`

Open del-zhenwu opened this issue 2 years ago • 1 comments

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

Commit: 5202456b4c76fee2a2e80184c8d5112dd26911d0 Running on PAI cluster

{'CUDA available': False,
 'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0',
 'MMEngine': '0.10.1',
 'OpenCV': '4.8.1',
 'PyTorch': '2.1.1',
 'PyTorch compiling details': 'PyTorch built with:\n'
                              '  - GCC 9.3\n'
                              '  - C++ Version: 201703\n'
                              '  - Intel(R) oneAPI Math Kernel Library Version '
                              '2023.1-Product Build 20230303 for Intel(R) 64 '
                              'architecture applications\n'
                              '  - Intel(R) MKL-DNN v3.1.1 (Git Hash '
                              '64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n'
                              '  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
                              '  - LAPACK is enabled (usually provided by '
                              'MKL)\n'
                              '  - NNPACK is enabled\n'
                              '  - CPU capability usage: AVX512\n'
                              '  - Build settings: BLAS_INFO=mkl, '
                              'BUILD_TYPE=Release, CUDA_VERSION=12.1, '
                              'CUDNN_VERSION=8.9.2, '
                              'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
                              'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
                              '-fabi-version=11 -fvisibility-inlines-hidden '
                              '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
                              '-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM '
                              '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
                              '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
                              '-O2 -fPIC -Wall -Wextra -Werror=return-type '
                              '-Werror=non-virtual-dtor -Werror=bool-operation '
                              '-Wnarrowing -Wno-missing-field-initializers '
                              '-Wno-type-limits -Wno-array-bounds '
                              '-Wno-unknown-pragmas -Wno-unused-parameter '
                              '-Wno-unused-function -Wno-unused-result '
                              '-Wno-strict-overflow -Wno-strict-aliasing '
                              '-Wno-stringop-overflow -Wno-psabi '
                              '-Wno-error=pedantic -Wno-error=old-style-cast '
                              '-Wno-invalid-partial-specialization '
                              '-Wno-unused-private-field '
                              '-Wno-aligned-allocation-unavailable '
                              '-Wno-missing-braces -fdiagnostics-color=always '
                              '-faligned-new -Wno-unused-but-set-variable '
                              '-Wno-maybe-uninitialized -fno-math-errno '
                              '-fno-trapping-math -Werror=format '
                              '-Werror=cast-function-type '
                              '-Wno-stringop-overflow, LAPACK_INFO=mkl, '
                              'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
                              'PERF_WITH_AVX512=1, '
                              'TORCH_DISABLE_GPU_ASSERTS=ON, '
                              'TORCH_VERSION=2.1.1, USE_CUDA=ON, USE_CUDNN=ON, '
                              'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
                              'USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, '
                              'USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, '
                              'USE_OPENMP=ON, USE_ROCM=OFF, \n',
 'Python': '3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]',
 'TorchVision': '0.16.1',
 'numpy_random_seed': 2147483648,
 'opencompass': '0.1.8+5202456',
 'sys.platform': 'linux'}

Reproduces the problem - code/configuration sample

Reproduces the problem - command or script

python run.py --models hf_opt_125m --datasets siqa_gen --debug

Reproduces the problem - error message

11/24 11:35:14 - OpenCompass - INFO - Task [opt125m/siqa]
11/24 11:35:19 - OpenCompass - INFO - Start inferencing [opt125m/siqa]
[2023-11-24 11:35:20,354] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
  0%|                                                                                                                                                                                                                                                                                                                               | 0/16 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/cpfs01/user//opencompass/opencompass/tasks/openicl_infer.py", line 148, in <module>
    inferencer.run()
  File "/cpfs01/user//opencompass/opencompass/tasks/openicl_infer.py", line 78, in run
    self._inference()
  File "/cpfs01/user//opencompass/opencompass/tasks/openicl_infer.py", line 121, in _inference
    inferencer.inference(retriever,
  File "/cpfs01/user//opencompass/opencompass/openicl/icl_inferencer/icl_gen_inferencer.py", line 133, in inference
    results = self.model.generate_from_template(
  File "/cpfs01/user/lizhenxiang/opencompass/opencompass/models/base.py", line 127, in generate_from_template
    return self.generate(inputs, max_out_len=max_out_len, **kwargs)
  File "/cpfs01/user//opencompass/opencompass/models/huggingface.py", line 210, in generate
    return sum((self._single_generate(
  File "/cpfs01/user//opencompass/opencompass/models/huggingface.py", line 210, in <genexpr>
    return sum((self._single_generate(
  File "/cpfs01/user//opencompass/opencompass/models/huggingface.py", line 317, in _single_generate
    outputs = self.model.generate(input_ids=input_ids,
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/utils.py", line 1673, in generate
    return self.greedy_search(
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/utils.py", line 2521, in greedy_search
    outputs = self(
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 879, in forward
    outputs = self.model.decoder(
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 645, in forward
    layer_outputs = decoder_layer(
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 296, in forward
    hidden_states = self.self_attn_layer_norm(hidden_states)
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 196, in forward
    return F.layer_norm(
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/functional.py", line 2543, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
[2023-11-24 11:35:22,481] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 12008) of binary: /cpfs01/user//miniconda3/envs/opencompass/bin/python
Traceback (most recent call last):
  File "/cpfs01/user//miniconda3/envs/opencompass/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==2.1.1', 'console_scripts', 'torchrun')())
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main
    run(args)
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/cpfs01/user//opencompass/opencompass/tasks/openicl_infer.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>

Other information

No response

del-zhenwu avatar Nov 24 '23 03:11 del-zhenwu

  1. check cuda availablity via python3 -c 'import torch; print(torch.cuda.is_available())'
  2. try to downgrade torch<2.0.0

Leymore avatar Nov 28 '23 04:11 Leymore

see in https://github.com/open-compass/opencompass/issues/756 Feel free to reopen it if needed

bittersweet1999 avatar Apr 28 '24 16:04 bittersweet1999