lightllm I tried test_llama.py，but.... help...T^T

Process Process-8: Process Process-7: Traceback (most recent call last): File "", line 21, in _rms_norm_fwd_fused KeyError: ('2-.-0-.-0-09caff3db89e80ddf0eb4f72675bc8f9-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, 'i32', 'i32', 'fp32'), (16384,), (True, True, True, (True, False), (True, False), (False,)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/envs/stan/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/opt/conda/envs/stan/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/data/lcx/lightllm/test/model/model_infer.py", line 51, in tppart_model_infer logics = model_part.forward(batch_size, File "/opt/conda/envs/stan/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/model.py", line 103, in forward predict_logics = self._context_forward(input_ids, infer_state) File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/model.py", line 141, in _context_forward input_embs = self.layers_infer[i].context_forward(input_embs, infer_state, self.trans_layers_weight[i]) File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/transformer_layer_inference.py", line 103, in context_forward self._context_flash_attention(input_embdings, File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/utils/infer_utils.py", line 21, in time_func ans = func(*args, **kwargs) File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/transformer_layer_inference.py", line 49, in context_flash_attention input1 = rmsnorm_forward(input_embding, weight=layer_weight.input_layernorm, eps=self.layer_norm_eps) File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/triton_kernel/rmsnorm.py", line 59, in rmsnorm_forward _rms_norm_fwd_fused[(M,)](x_arg, y, weight, File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/runtime/jit.py", line 106, in launcher return self.run(*args, grid=grid, **kwargs) File "", line 41, in _rms_norm_fwd_fused File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/compiler.py", line 1256, in compile asm, shared, kernel_name = _compile(fn, signature, device, constants, configs[0], num_warps, num_stages, File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/compiler.py", line 901, in _compile name, asm, shared_mem = _triton.code_gen.compile_ttir(backend, module, device, num_warps, num_stages, extern_libs, cc) RuntimeError: Triton requires CUDA 11.4+ Process Process-2: Process Process-5: Traceback (most recent call last): File "", line 21, in _rms_norm_fwd_fused KeyError: ('2-.-0-.-0-09caff3db89e80ddf0eb4f72675bc8f9-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, 'i32', 'i32', 'fp32'), (16384,), (True, True, True, (True, False), (True, False), (False,)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/envs/stan/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/opt/conda/envs/stan/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/data/lcx/lightllm/test/model/model_infer.py", line 51, in tppart_model_infer logics = model_part.forward(batch_size, File "/opt/conda/envs/stan/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/model.py", line 103, in forward predict_logics = self._context_forward(input_ids, infer_state) File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/model.py", line 141, in _context_forward input_embs = self.layers_infer[i].context_forward(input_embs, infer_state, self.trans_layers_weight[i]) File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/transformer_layer_inference.py", line 103, in context_forward self._context_flash_attention(input_embdings, File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/utils/infer_utils.py", line 21, in time_func ans = func(*args, **kwargs) File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/layer_infer/transformer_layer_inference.py", line 49, in context_flash_attention input1 = rmsnorm_forward(input_embding, weight=layer_weight.input_layernorm, eps=self.layer_norm_eps) Process Process-1: File "/opt/conda/envs/stan/lib/python3.10/site-packages/lightllm-1.0.0-py3.10.egg/lightllm/models/llama/triton_kernel/rmsnorm.py", line 59, in rmsnorm_forward _rms_norm_fwd_fused[(M,)](x_arg, y, weight, File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/runtime/jit.py", line 106, in launcher return self.run(*args, grid=grid, **kwargs) File "", line 41, in _rms_norm_fwd_fused File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/compiler.py", line 1256, in compile asm, shared, kernel_name = _compile(fn, signature, device, constants, configs[0], num_warps, num_stages, File "/opt/conda/envs/stan/lib/python3.10/site-packages/triton/compiler.py", line 901, in _compile name, asm, shared_mem = _triton.code_gen.compile_ttir(backend, module, device, num_warps, num_stages, extern_libs, cc) RuntimeError: Triton requires CUDA 11.4+ Traceback (most recent call last): File "", line 21, in _rms_norm_fwd_fused KeyError: ('2-.-0-.-0-09caff3db89e80ddf0eb4f72675bc8f9-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, 'i32', 'i32', 'fp32'), (16384,), (True, True, True, (True, False), (True, False), (False,)))

During handling of the above exception, another exception occurred:

Aug 03 '23 09:08 MissQueen

@MissQueen "RuntimeError: Triton requires CUDA 11.4+，" this error show that you need update your cuda version。recommend to use cuda 11.8 or higher. what is your gpu name ？

Aug 03 '23 09:08 hiworldwzj

@MissQueen "RuntimeError: Triton requires CUDA 11.4+，" this error show that you need update your cuda version。recommend to use cuda 11.8 or higher. what is your gpu name ？

Do you mean this? 'torch.version.cuda' is 11.7.. sorry..but...how to upgrade?

Aug 04 '23 02:08 MissQueen

@MissQueen Hello, I suggest that you install a clean Python environment using conda, with python==3.9, and then install cuda==11.8 and pytorch.

Aug 04 '23 02:08 hiworldwzj

I encountered the same problem. I think it caused by triton version, try install triton-nightly(2.1.0), it works for me.

pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly

or install from source

git clone https://github.com/openai/triton.git;
cd triton/python;
pip install cmake; # build-time dependency
pip install -e .

Aug 25 '23 09:08 pingzhuu