GPUCompiler.jl
GPUCompiler.jl copied to clipboard
Call has wrong number of parameters\nptxas fatal
Today I see an error when using CuArrays. I wonder if there is something wrong
ERROR: LoadError: CUDAdrv.CuError(CUDAdrv.cudaError_enum(0x000000da), "ptxas application ptx input, line 380; error : Call has wrong number of parameters\nptxas fatal : Ptx assembly aborted due to errors")
Stacktrace:
[1] CUDAdrv.CuModule(::String, ::Dict{CUDAdrv.CUjit_option_enum,Any}) at /home/gzhang8/.julia/packages/CUDAdrv/Uc14X/src/module.jl:40
[2] _cufunction(::GPUCompiler.FunctionSpec{GPUArrays.var"#26#27",Tuple{CuArrays.CuKernelContext,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Float32}}}}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/gzhang8/.julia/packages/CUDAnative/e0IdN/src/execution.jl:335
[3] _cufunction at /home/gzhang8/.julia/packages/CUDAnative/e0IdN/src/execution.jl:302 [inlined]
[4] #77 at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/src/cache.jl:21 [inlined]
[5] get!(::GPUCompiler.var"#77#78"{Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},typeof(CUDAnative._cufunction),GPUCompiler.FunctionSpec{GPUArrays.var"#26#27",Tuple{CuArrays.CuKernelContext,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Float32}}}}}, ::Dict{UInt64,Any}, ::UInt64) at ./dict.jl:452
[6] macro expansion at ./lock.jl:183 [inlined]
[7] check_cache(::typeof(CUDAnative._cufunction), ::GPUCompiler.FunctionSpec{GPUArrays.var"#26#27",Tuple{CuArrays.CuKernelContext,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Float32}}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/src/cache.jl:19
[8] + at ./int.jl:53 [inlined]
[9] hash_64_64 at ./hashing.jl:35 [inlined]
[10] hash_uint64 at ./hashing.jl:62 [inlined]
[11] hx at ./float.jl:568 [inlined]
[12] hash at ./float.jl:571 [inlined]
[13] cached_compilation(::typeof(CUDAnative._cufunction), ::GPUCompiler.FunctionSpec{GPUArrays.var"#26#27",Tuple{CuArrays.CuKernelContext,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Float32}}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/src/cache.jl:0
[14] cached_compilation(::Function, ::GPUCompiler.FunctionSpec{GPUArrays.var"#26#27",Tuple{CuArrays.CuKernelContext,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Float32}}}}, ::UInt64) at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/src/cache.jl:37
[15] cufunction(::Function, ::Type; name::String, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/gzhang8/.julia/packages/CUDAnative/e0IdN/src/execution.jl:296
[16] macro expansion at /home/gzhang8/.julia/packages/CUDAnative/e0IdN/src/execution.jl:108 [inlined]
[17] gpu_call(::CuArrays.CuArrayBackend, ::Function, ::Tuple{CuArrays.CuArray{Float32,1,Nothing},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Float32}}}, ::Int64; name::String) at /home/gzhang8/.julia/packages/CuArrays/l0gXB/src/gpuarrays.jl:32
[18] #gpu_call#1 at /home/gzhang8/.julia/packages/GPUArrays/OXvxB/src/device/execution.jl:61 [inlined]
[19] copyto! at /home/gzhang8/.julia/packages/GPUArrays/OXvxB/src/host/broadcast.jl:63 [inlined]
[20] copyto! at /home/gzhang8/.julia/packages/GPUArrays/OXvxB/src/host/broadcast.jl:75 [inlined]
I also try to run test of GPUCompiler. I see one error as well.
test GPUCompiler
Testing GPUCompiler
Downloading artifact: LLVM
######################################################################## 100.0%#=#=-# # Status `/tmp/jl_P2HEMf/Manifest.toml`
[fa961155] CEnum v0.2.0
[da1fd8a2] CodeTracking v0.5.11
[f68482b8] Cthulhu v1.1.1
[864edb3b] DataStructures v0.17.16
[61eb1bfa] GPUCompiler v0.2.0
[929cbde3] LLVM v1.4.1
[86de99a1] LLVM_jll v8.0.1+0
[bac558e1] OrderedCollections v1.2.0
[a759f4b9] TimerOutputs v0.5.5
[2a0f44e3] Base64
[ade2ca70] Dates
[8ba89e20] Distributed
[b77e0a4c] InteractiveUtils
[76f85450] LibGit2
[8f399da3] Libdl
[56ddb016] Logging
[d6f4376e] Markdown
[44cfe95a] Pkg
[de0858da] Printf
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA
[9e88b42a] Serialization
[6462fe0b] Sockets
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
GC and TLS lowering: Test Failed at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/test/ptx.jl:241
Expression: !(occursin("gpu_gc_pool_alloc", asm))
Evaluated: !(occursin("gpu_gc_pool_alloc", "//\n// Generated by LLVM NVPTX Back-End\n//\n\n.version 6.0\n.target sm_70\n.address_size 64\n\n\t// .globl\tjulia_ref_kernel_18360 // -- Begin function julia_ref_kernel_18360\n.func (.param .b64 func_retval0) gpu_gc_pool_alloc\n(\n\t.param .b64 gpu_gc_pool_alloc_param_0\n)\n;\n.global .align 1 .b8 exception[10] = {101, 120, 99, 101, 112, 116, 105, 111, 110, 0};\n // @julia_ref_kernel_18360\n.visible .func julia_ref_kernel_18360(\n\t.param .b64 julia_ref_kernel_18360_param_0,\n\t.param .b64 julia_ref_kernel_18360_param_1\n)\n{\n\t.reg .pred \t%p<2>;\n\t.reg .b64 \t%rd<10>;\n\n// %bb.0: // %top\n\tld.param.u64 \t%rd1, [julia_ref_kernel_18360_param_0];\n\tmov.u64 \t%rd2, 8;\n\t{ // callseq 34, 0\n\t.reg .b32 temp_param_reg;\n\t.param .b64 param0;\n\tst.param.b64 \t[param0+0], %rd2;\n\t.param .b64 retval0;\n\tcall.uni (retval0), \n\tgpu_gc_pool_alloc, \n\t(\n\tparam0\n\t);\n\tld.param.b64 \t%rd3, [retval0+0];\n\t} // callseq 34\n\tld.param.u64 \t%rd5, [julia_ref_kernel_18360_param_1];\n\tsetp.lt.s64 \t%p1, %rd5, 2;\n\tselp.b64 \t%rd6, 2, 1, %p1;\n\tst.u64 \t[%rd3], %rd6;\n\tshl.b64 \t%rd7, %rd5, 3;\n\tadd.s64 \t%rd8, %rd1, %rd7;\n\tmov.u64 \t%rd9, 0;\n\tst.u8 \t[%rd8+-1], %rd9;\n\tst.u8 \t[%rd8+-2], %rd9;\n\tst.u8 \t[%rd8+-3], %rd9;\n\tst.u8 \t[%rd8+-4], %rd9;\n\tst.u8 \t[%rd8+-5], %rd9;\n\tst.u8 \t[%rd8+-6], %rd9;\n\tst.u8 \t[%rd8+-7], %rd9;\n\tst.u8 \t[%rd8+-8], %rd6;\n\tret;\n // -- End function\n}\n.func (.param .b64 func_retval0) gpu_malloc(\n\t.param .b64 gpu_malloc_param_0\n) // -- Begin function gpu_malloc\n // @gpu_malloc\n{\n\t.reg .b64 \t%rd<2>;\n\n// %bb.0: // %top\n\tmov.u64 \t%rd1, 0;\n\tst.param.b64 \t[func_retval0+0], %rd1;\n\tret;\n // -- End function\n}\n.func gpu_report_exception(\n\t.param .b64 gpu_report_exception_param_0\n) // -- Begin function gpu_report_exception\n // @gpu_report_exception\n{\n\n\n// %bb.0: // %top\n\tret;\n // -- End function\n}\n.func gpu_report_oom(\n\t.param .b64 gpu_report_oom_param_0\n) // -- Begin function gpu_report_oom\n // @gpu_report_oom\n{\n\n\n// %bb.0: // %top\n\tret;\n // -- End function\n}\n.func gpu_signal_exception() // -- Begin function gpu_signal_exception\n // @gpu_signal_exception\n{\n\n\n// %bb.0: // %top\n\tret;\n // -- End function\n}\n.func (.param .b64 func_retval0) gpu_gc_pool_alloc(\n\t.param .b64 gpu_gc_pool_alloc_param_0\n) // -- Begin function gpu_gc_pool_alloc\n // @gpu_gc_pool_alloc\n{\n\t.reg .pred \t%p<2>;\n\t.reg .b64 \t%rd<6>;\n\n// %bb.0: // %top\n\tld.param.u64 \t%rd2, [gpu_gc_pool_alloc_param_0];\n\t{ // callseq 35, 0\n\t.reg .b32 temp_param_reg;\n\t.param .b64 param0;\n\tst.param.b64 \t[param0+0], %rd2;\n\t.param .b64 retval0;\n\tcall.uni (retval0), \n\tgpu_malloc, \n\t(\n\tparam0\n\t);\n\tld.param.b64 \t%rd3, [retval0+0];\n\t} // callseq 35\n\tsetp.ne.s64 \t%p1, %rd3, 0;\n\t@%p1 bra \tLBB5_2;\n// %bb.1: // %L7\n\t{ // callseq 36, 0\n\t.reg .b32 temp_param_reg;\n\t.param .b64 param0;\n\tst.param.b64 \t[param0+0], %rd2;\n\tcall.uni \n\tgpu_report_oom, \n\t(\n\tparam0\n\t);\n\t} // callseq 36\n\tmov.u64 \t%rd4, exception;\n\tcvta.global.u64 \t%rd5, %rd4;\n\t{ // callseq 37, 0\n\t.reg .b32 temp_param_reg;\n\t.param .b64 param0;\n\tst.param.b64 \t[param0+0], %rd5;\n\tcall.uni \n\tgpu_report_exception, \n\t(\n\tparam0\n\t);\n\t} // callseq 37\n\t{ // callseq 38, 0\n\t.reg .b32 temp_param_reg;\n\tcall.uni \n\tgpu_signal_exception, \n\t(\n\t);\n\t} // callseq 38\n\t// begin inline asm\n\texit;\n\t// end inline asm\nLBB5_2: // %L10\n\tst.param.b64 \t[func_retval0+0], %rd3;\n\tret;\n // -- End function\n}\n\n"))
Stacktrace:
[1] top-level scope at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/test/ptx.jl:241
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[3] top-level scope at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/test/ptx.jl:204
[4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[5] top-level scope at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/test/ptx.jl:79
[6] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[7] top-level scope at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/test/ptx.jl:3
Test Summary: | Pass Fail Total
GPUCompiler | 72 1 73
native | 39 39
PTX | 33 1 34
IR | 18 18
I tried difference version of cuda and driver (440 10.2, 435 10.1, 41x 10.0). I also tried Julia 1.3 and 1.4. All have this problem. Any clue? Thank you in advance
Haven't seen that one before. Could you @device_code_warntype your code that fails and post the PTX code?