inference icon indicating copy to clipboard operation
inference copied to clipboard

Problem with run GPTJ-99

Open Agalakdak opened this issue 11 months ago • 0 comments

I wanted to run GPT-J using the command below But I ran into an error. What are the possible solutions?

mlcr run-mlperf,inference,_find-performance,_full,_r4.1-dev \
   --model=gptj-99 \
   --implementation=nvidia \
   --framework=tensorrt \
   --category=edge \
   --scenario=Offline \
   --execution_mode=test \
   --device=cuda  \
   --docker --quiet \
   --test_query_count=50

93%] Built target layers_src [ 93%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttention48_float.cu.o /code/tensorrt_llm/cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/threadblock/epilogue_tensor_op_int32.h(97): error: class template "cutlass::epilogue::threadblock::detail::DefaultIteratorsTensorOp" has already been defined struct DefaultIteratorsTensorOp<cutlass::bfloat16_t, int32_t, 8, ThreadblockShape, WarpShape, InstructionShape, ^

/code/tensorrt_llm/cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/threadblock/epilogue_tensor_op_int32.h(97): error: class template "cutlass::epilogue::threadblock::detail::DefaultIteratorsTensorOp" has already been defined struct DefaultIteratorsTensorOp<cutlass::bfloat16_t, int32_t, 8, ThreadblockShape, WarpShape, InstructionShape, ^

/code/tensorrt_llm/cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/threadblock/epilogue_tensor_op_int32.h(97): error: class template "cutlass::epilogue::threadblock::detail::DefaultIteratorsTensorOp" has already been defined struct DefaultIteratorsTensorOp<cutlass::bfloat16_t, int32_t, 8, ThreadblockShape, WarpShape, InstructionShape, ^

1 error detected in the compilation of "/code/tensorrt_llm/cpp/tensorrt_llm/kernels/cutlass_kernels/int8_gemm/int8_gemm_bf16.cu". /code/tensorrt_llm/cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/threadblock/epilogue_tensor_op_int32.h(97): error: class template "cutlass::epilogue::threadblock::detail::DefaultIteratorsTensorOp" has already been defined struct DefaultIteratorsTensorOp<cutlass::bfloat16_t, int32_t, 8, ThreadblockShape, WarpShape, InstructionShape, ^

gmake[3]: *** [tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/build.make:12917: tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/int8_gemm/int8_gemm_bf16.cu.o] Error 2 gmake[3]: *** Waiting for unfinished jobs.... 1 error detected in the compilation of "/code/tensorrt_llm/cpp/tensorrt_llm/kernels/cutlass_kernels/int8_gemm/int8_gemm_fp32.cu". 1 error detected in the compilation of "/code/tensorrt_llm/cpp/tensorrt_llm/kernels/cutlass_kernels/int8_gemm/int8_gemm_int32.cu". gmake[3]: *** [tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/build.make:12947: tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/int8_gemm/int8_gemm_fp32.cu.o] Error 2 gmake[3]: *** [tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/build.make:12962: tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/int8_gemm/int8_gemm_int32.cu.o] Error 2 1 error detected in the compilation of "/code/tensorrt_llm/cpp/tensorrt_llm/kernels/cutlass_kernels/int8_gemm/int8_gemm_fp16.cu". gmake[3]: *** [tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/build.make:12932: tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/cutlass_kernels/int8_gemm/int8_gemm_fp16.cu.o] Error 2 [ 93%] Built target common_src [ 93%] Built target runtime_src gmake[2]: *** [CMakeFiles/Makefile2:816: tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/all] Error 2 gmake[1]: *** [CMakeFiles/Makefile2:771: tensorrt_llm/CMakeFiles/tensorrt_llm.dir/rule] Error 2 gmake: *** [Makefile:192: tensorrt_llm] Error 2 Traceback (most recent call last): File "/code/tensorrt_llm/scripts/build_wheel.py", line 319, in main(**vars(args)) File "/code/tensorrt_llm/scripts/build_wheel.py", line 164, in main build_run( File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 64 --target tensorrt_llm nvinfer_plugin_tensorrt_llm th_common bindings ' returned non-zero exit status 2. make: *** [Makefile:102: devel_run] Error 1 make: Leaving directory '/home/user/MLC/repos/local/cache/get-git-repo_15a989f1/repo/docker' Traceback (most recent call last): File "/home/user/mlc/bin/mlcr", line 8, in sys.exit(mlcr()) ^^^^^^ File "/home/user/mlc/lib/python3.12/site-packages/mlc/main.py", line 1715, in mlcr main() File "/home/user/mlc/lib/python3.12/site-packages/mlc/main.py", line 1797, in main res = method(run_args) ^^^^^^^^^^^^^^^^ File "/home/user/mlc/lib/python3.12/site-packages/mlc/main.py", line 1529, in run return self.call_script_module_function("run", run_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/mlc/lib/python3.12/site-packages/mlc/main.py", line 1509, in call_script_module_function result = automation_instance.run(run_args) # Pass args to the run method ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 225, in run r = self._run(i) ^^^^^^^^^^^^ File "/home/user/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1772, in _run r = customize_code.preprocess(ii) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/MLC/repos/mlcommons@mlperf-automations/script/run-mlperf-inference-app/customize.py", line 284, in preprocess r = mlc.access(ii) ^^^^^^^^^^^^^^ File "/home/user/mlc/lib/python3.12/site-packages/mlc/main.py", line 92, in access result = method(options) ^^^^^^^^^^^^^^^ File "/home/user/mlc/lib/python3.12/site-packages/mlc/main.py", line 1526, in docker return self.call_script_module_function("docker", run_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/mlc/lib/python3.12/site-packages/mlc/main.py", line 1511, in call_script_module_function result = automation_instance.docker(run_args) # Pass args to the run method ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 4691, in docker return docker_run(self, i) ^^^^^^^^^^^^^^^^^^^ File "/home/user/MLC/repos/mlcommons@mlperf-automations/automation/script/docker.py", line 308, in docker_run r = self_module._run_deps( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 3702, in _run_deps r = self.action_object.access(ii) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/mlc/lib/python3.12/site-packages/mlc/main.py", line 92, in access result = method(options) ^^^^^^^^^^^^^^^ File "/home/user/mlc/lib/python3.12/site-packages/mlc/main.py", line 1529, in run return self.call_script_module_function("run", run_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/mlc/lib/python3.12/site-packages/mlc/main.py", line 1519, in call_script_module_function raise ScriptExecutionError(f"Script {function_name} execution failed. Error : {error}") mlc.main.ScriptExecutionError: Script run execution failed. Error : MLC script failed (name = get-ml-model-gptj, return code = 256)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Please file an issue at https://github.com/mlcommons/mlperf-automations/issues along with the full MLC command being run and the relevant or full console log.

Agalakdak avatar Mar 05 '25 08:03 Agalakdak