Call has wrong number of parameters\nptxas fatal

Open gzhang8 opened this issue 5 years ago • 1 comments

Today I see an error when using CuArrays. I wonder if there is something wrong

ERROR: LoadError: CUDAdrv.CuError(CUDAdrv.cudaError_enum(0x000000da), "ptxas application ptx input, line 380; error   : Call has wrong number of parameters\nptxas fatal   : Ptx assembly aborted due to errors")
Stacktrace:
 [1] CUDAdrv.CuModule(::String, ::Dict{CUDAdrv.CUjit_option_enum,Any}) at /home/gzhang8/.julia/packages/CUDAdrv/Uc14X/src/module.jl:40
 [2] _cufunction(::GPUCompiler.FunctionSpec{GPUArrays.var"#26#27",Tuple{CuArrays.CuKernelContext,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Float32}}}}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/gzhang8/.julia/packages/CUDAnative/e0IdN/src/execution.jl:335
 [3] _cufunction at /home/gzhang8/.julia/packages/CUDAnative/e0IdN/src/execution.jl:302 [inlined]
 [4] #77 at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/src/cache.jl:21 [inlined]
 [5] get!(::GPUCompiler.var"#77#78"{Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},typeof(CUDAnative._cufunction),GPUCompiler.FunctionSpec{GPUArrays.var"#26#27",Tuple{CuArrays.CuKernelContext,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Float32}}}}}, ::Dict{UInt64,Any}, ::UInt64) at ./dict.jl:452
 [6] macro expansion at ./lock.jl:183 [inlined]
 [7] check_cache(::typeof(CUDAnative._cufunction), ::GPUCompiler.FunctionSpec{GPUArrays.var"#26#27",Tuple{CuArrays.CuKernelContext,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Float32}}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/src/cache.jl:19
 [8] + at ./int.jl:53 [inlined]
 [9] hash_64_64 at ./hashing.jl:35 [inlined]
 [10] hash_uint64 at ./hashing.jl:62 [inlined]
 [11] hx at ./float.jl:568 [inlined]
 [12] hash at ./float.jl:571 [inlined]
 [13] cached_compilation(::typeof(CUDAnative._cufunction), ::GPUCompiler.FunctionSpec{GPUArrays.var"#26#27",Tuple{CuArrays.CuKernelContext,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Float32}}}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/src/cache.jl:0
 [14] cached_compilation(::Function, ::GPUCompiler.FunctionSpec{GPUArrays.var"#26#27",Tuple{CuArrays.CuKernelContext,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Float32}}}}, ::UInt64) at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/src/cache.jl:37
 [15] cufunction(::Function, ::Type; name::String, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/gzhang8/.julia/packages/CUDAnative/e0IdN/src/execution.jl:296
 [16] macro expansion at /home/gzhang8/.julia/packages/CUDAnative/e0IdN/src/execution.jl:108 [inlined]
 [17] gpu_call(::CuArrays.CuArrayBackend, ::Function, ::Tuple{CuArrays.CuArray{Float32,1,Nothing},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Float32}}}, ::Int64; name::String) at /home/gzhang8/.julia/packages/CuArrays/l0gXB/src/gpuarrays.jl:32
 [18] #gpu_call#1 at /home/gzhang8/.julia/packages/GPUArrays/OXvxB/src/device/execution.jl:61 [inlined]
 [19] copyto! at /home/gzhang8/.julia/packages/GPUArrays/OXvxB/src/host/broadcast.jl:63 [inlined]
 [20] copyto! at /home/gzhang8/.julia/packages/GPUArrays/OXvxB/src/host/broadcast.jl:75 [inlined]

I also try to run test of GPUCompiler. I see one error as well.

test GPUCompiler
    Testing GPUCompiler
Downloading artifact: LLVM
######################################################################## 100.0%#=#=-#  #                                                             Status `/tmp/jl_P2HEMf/Manifest.toml`
  [fa961155] CEnum v0.2.0
  [da1fd8a2] CodeTracking v0.5.11
  [f68482b8] Cthulhu v1.1.1
  [864edb3b] DataStructures v0.17.16
  [61eb1bfa] GPUCompiler v0.2.0
  [929cbde3] LLVM v1.4.1
  [86de99a1] LLVM_jll v8.0.1+0
  [bac558e1] OrderedCollections v1.2.0
  [a759f4b9] TimerOutputs v0.5.5
  [2a0f44e3] Base64 
  [ade2ca70] Dates 
  [8ba89e20] Distributed 
  [b77e0a4c] InteractiveUtils 
  [76f85450] LibGit2 
  [8f399da3] Libdl 
  [56ddb016] Logging 
  [d6f4376e] Markdown 
  [44cfe95a] Pkg 
  [de0858da] Printf 
  [3fa0cd96] REPL 
  [9a3f8284] Random 
  [ea8e919c] SHA 
  [9e88b42a] Serialization 
  [6462fe0b] Sockets 
  [8dfed614] Test 
  [cf7118a7] UUIDs 
  [4ec0a83e] Unicode 
GC and TLS lowering: Test Failed at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/test/ptx.jl:241
  Expression: !(occursin("gpu_gc_pool_alloc", asm))
   Evaluated: !(occursin("gpu_gc_pool_alloc", "//\n// Generated by LLVM NVPTX Back-End\n//\n\n.version 6.0\n.target sm_70\n.address_size 64\n\n\t// .globl\tjulia_ref_kernel_18360 // -- Begin function julia_ref_kernel_18360\n.func  (.param .b64 func_retval0) gpu_gc_pool_alloc\n(\n\t.param .b64 gpu_gc_pool_alloc_param_0\n)\n;\n.global .align 1 .b8 exception[10] = {101, 120, 99, 101, 112, 116, 105, 111, 110, 0};\n                                        // @julia_ref_kernel_18360\n.visible .func julia_ref_kernel_18360(\n\t.param .b64 julia_ref_kernel_18360_param_0,\n\t.param .b64 julia_ref_kernel_18360_param_1\n)\n{\n\t.reg .pred \t%p<2>;\n\t.reg .b64 \t%rd<10>;\n\n// %bb.0:                               // %top\n\tld.param.u64 \t%rd1, [julia_ref_kernel_18360_param_0];\n\tmov.u64 \t%rd2, 8;\n\t{ // callseq 34, 0\n\t.reg .b32 temp_param_reg;\n\t.param .b64 param0;\n\tst.param.b64 \t[param0+0], %rd2;\n\t.param .b64 retval0;\n\tcall.uni (retval0), \n\tgpu_gc_pool_alloc, \n\t(\n\tparam0\n\t);\n\tld.param.b64 \t%rd3, [retval0+0];\n\t} // callseq 34\n\tld.param.u64 \t%rd5, [julia_ref_kernel_18360_param_1];\n\tsetp.lt.s64 \t%p1, %rd5, 2;\n\tselp.b64 \t%rd6, 2, 1, %p1;\n\tst.u64 \t[%rd3], %rd6;\n\tshl.b64 \t%rd7, %rd5, 3;\n\tadd.s64 \t%rd8, %rd1, %rd7;\n\tmov.u64 \t%rd9, 0;\n\tst.u8 \t[%rd8+-1], %rd9;\n\tst.u8 \t[%rd8+-2], %rd9;\n\tst.u8 \t[%rd8+-3], %rd9;\n\tst.u8 \t[%rd8+-4], %rd9;\n\tst.u8 \t[%rd8+-5], %rd9;\n\tst.u8 \t[%rd8+-6], %rd9;\n\tst.u8 \t[%rd8+-7], %rd9;\n\tst.u8 \t[%rd8+-8], %rd6;\n\tret;\n                                        // -- End function\n}\n.func  (.param .b64 func_retval0) gpu_malloc(\n\t.param .b64 gpu_malloc_param_0\n)                                       // -- Begin function gpu_malloc\n                                        // @gpu_malloc\n{\n\t.reg .b64 \t%rd<2>;\n\n// %bb.0:                               // %top\n\tmov.u64 \t%rd1, 0;\n\tst.param.b64 \t[func_retval0+0], %rd1;\n\tret;\n                                        // -- End function\n}\n.func gpu_report_exception(\n\t.param .b64 gpu_report_exception_param_0\n)                                       // -- Begin function gpu_report_exception\n                                        // @gpu_report_exception\n{\n\n\n// %bb.0:                               // %top\n\tret;\n                                        // -- End function\n}\n.func gpu_report_oom(\n\t.param .b64 gpu_report_oom_param_0\n)                                       // -- Begin function gpu_report_oom\n                                        // @gpu_report_oom\n{\n\n\n// %bb.0:                               // %top\n\tret;\n                                        // -- End function\n}\n.func gpu_signal_exception()            // -- Begin function gpu_signal_exception\n                                        // @gpu_signal_exception\n{\n\n\n// %bb.0:                               // %top\n\tret;\n                                        // -- End function\n}\n.func  (.param .b64 func_retval0) gpu_gc_pool_alloc(\n\t.param .b64 gpu_gc_pool_alloc_param_0\n)                                       // -- Begin function gpu_gc_pool_alloc\n                                        // @gpu_gc_pool_alloc\n{\n\t.reg .pred \t%p<2>;\n\t.reg .b64 \t%rd<6>;\n\n// %bb.0:                               // %top\n\tld.param.u64 \t%rd2, [gpu_gc_pool_alloc_param_0];\n\t{ // callseq 35, 0\n\t.reg .b32 temp_param_reg;\n\t.param .b64 param0;\n\tst.param.b64 \t[param0+0], %rd2;\n\t.param .b64 retval0;\n\tcall.uni (retval0), \n\tgpu_malloc, \n\t(\n\tparam0\n\t);\n\tld.param.b64 \t%rd3, [retval0+0];\n\t} // callseq 35\n\tsetp.ne.s64 \t%p1, %rd3, 0;\n\t@%p1 bra \tLBB5_2;\n// %bb.1:                               // %L7\n\t{ // callseq 36, 0\n\t.reg .b32 temp_param_reg;\n\t.param .b64 param0;\n\tst.param.b64 \t[param0+0], %rd2;\n\tcall.uni \n\tgpu_report_oom, \n\t(\n\tparam0\n\t);\n\t} // callseq 36\n\tmov.u64 \t%rd4, exception;\n\tcvta.global.u64 \t%rd5, %rd4;\n\t{ // callseq 37, 0\n\t.reg .b32 temp_param_reg;\n\t.param .b64 param0;\n\tst.param.b64 \t[param0+0], %rd5;\n\tcall.uni \n\tgpu_report_exception, \n\t(\n\tparam0\n\t);\n\t} // callseq 37\n\t{ // callseq 38, 0\n\t.reg .b32 temp_param_reg;\n\tcall.uni \n\tgpu_signal_exception, \n\t(\n\t);\n\t} // callseq 38\n\t// begin inline asm\n\texit;\n\t// end inline asm\nLBB5_2:                                 // %L10\n\tst.param.b64 \t[func_retval0+0], %rd3;\n\tret;\n                                        // -- End function\n}\n\n"))
Stacktrace:
 [1] top-level scope at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/test/ptx.jl:241
 [2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
 [3] top-level scope at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/test/ptx.jl:204
 [4] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
 [5] top-level scope at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/test/ptx.jl:79
 [6] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
 [7] top-level scope at /home/gzhang8/.julia/packages/GPUCompiler/bwcs0/test/ptx.jl:3
Test Summary:                      | Pass  Fail  Total
GPUCompiler                        |   72     1     73
  native                           |   39           39
  PTX                              |   33     1     34
    IR                             |   18           18

I tried difference version of cuda and driver (440 10.2, 435 10.1, 41x 10.0). I also tried Julia 1.3 and 1.4. All have this problem. Any clue? Thank you in advance

May 18 '20 02:05 gzhang8

Haven't seen that one before. Could you @device_code_warntype your code that fails and post the PTX code?

May 27 '20 07:05 maleadt