opencompass
opencompass copied to clipboard
[Bug] Running demo failed with `"LayerNormKernelImpl" not implemented for 'Half'`
Prerequisite
- [X] I have searched Issues and Discussions but cannot get the expected help.
- [X] The bug has not been fixed in the latest version.
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
Commit: 5202456b4c76fee2a2e80184c8d5112dd26911d0
Running on PAI cluster
{'CUDA available': False,
'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0',
'MMEngine': '0.10.1',
'OpenCV': '4.8.1',
'PyTorch': '2.1.1',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2023.1-Product Build 20230303 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.1.1 (Git Hash '
'64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX512\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.1, '
'CUDNN_VERSION=8.9.2, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM '
'-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-unused-function -Wno-unused-result '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wno-psabi '
'-Wno-error=pedantic -Wno-error=old-style-cast '
'-Wno-invalid-partial-specialization '
'-Wno-unused-private-field '
'-Wno-aligned-allocation-unavailable '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Werror=cast-function-type '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'PERF_WITH_AVX512=1, '
'TORCH_DISABLE_GPU_ASSERTS=ON, '
'TORCH_VERSION=2.1.1, USE_CUDA=ON, USE_CUDNN=ON, '
'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
'USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, '
'USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, '
'USE_OPENMP=ON, USE_ROCM=OFF, \n',
'Python': '3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]',
'TorchVision': '0.16.1',
'numpy_random_seed': 2147483648,
'opencompass': '0.1.8+5202456',
'sys.platform': 'linux'}
Reproduces the problem - code/configuration sample
Reproduces the problem - command or script
python run.py --models hf_opt_125m --datasets siqa_gen --debug
Reproduces the problem - error message
11/24 11:35:14 - OpenCompass - INFO - Task [opt125m/siqa]
11/24 11:35:19 - OpenCompass - INFO - Start inferencing [opt125m/siqa]
[2023-11-24 11:35:20,354] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
0%| | 0/16 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/cpfs01/user//opencompass/opencompass/tasks/openicl_infer.py", line 148, in <module>
inferencer.run()
File "/cpfs01/user//opencompass/opencompass/tasks/openicl_infer.py", line 78, in run
self._inference()
File "/cpfs01/user//opencompass/opencompass/tasks/openicl_infer.py", line 121, in _inference
inferencer.inference(retriever,
File "/cpfs01/user//opencompass/opencompass/openicl/icl_inferencer/icl_gen_inferencer.py", line 133, in inference
results = self.model.generate_from_template(
File "/cpfs01/user/lizhenxiang/opencompass/opencompass/models/base.py", line 127, in generate_from_template
return self.generate(inputs, max_out_len=max_out_len, **kwargs)
File "/cpfs01/user//opencompass/opencompass/models/huggingface.py", line 210, in generate
return sum((self._single_generate(
File "/cpfs01/user//opencompass/opencompass/models/huggingface.py", line 210, in <genexpr>
return sum((self._single_generate(
File "/cpfs01/user//opencompass/opencompass/models/huggingface.py", line 317, in _single_generate
outputs = self.model.generate(input_ids=input_ids,
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/utils.py", line 1673, in generate
return self.greedy_search(
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/generation/utils.py", line 2521, in greedy_search
outputs = self(
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 879, in forward
outputs = self.model.decoder(
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 645, in forward
layer_outputs = decoder_layer(
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 296, in forward
hidden_states = self.self_attn_layer_norm(hidden_states)
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 196, in forward
return F.layer_norm(
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/nn/functional.py", line 2543, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
[2023-11-24 11:35:22,481] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 12008) of binary: /cpfs01/user//miniconda3/envs/opencompass/bin/python
Traceback (most recent call last):
File "/cpfs01/user//miniconda3/envs/opencompass/bin/torchrun", line 33, in <module>
sys.exit(load_entry_point('torch==2.1.1', 'console_scripts', 'torchrun')())
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/cpfs01/user//miniconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/cpfs01/user//opencompass/opencompass/tasks/openicl_infer.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
Other information
No response
- check cuda availablity via
python3 -c 'import torch; print(torch.cuda.is_available())' - try to downgrade torch<2.0.0
see in https://github.com/open-compass/opencompass/issues/756 Feel free to reopen it if needed