Integer overflow for large ROCArray A in AMDGPU.rand!(A)
The following error appears for a large ROCArray, 50Kx50K (or larger) Float32 on AMDGPU.rand!
It points at this line in random.jl and might be related to using Int32.
To reproduce: Hardware: Crusher MI250X using rocm 5.1.0 and AMDGPU v0.3.7 Code:
import AMDGPU
A = AMDGPU.ROCArray{Float32,2}(undef, 50000, 50000)
AMDGPU.rand!(A)
Error message:
ERROR: LoadError: InexactError: trunc(Int32, 3600000000)
Stacktrace:
[1] throw_inexacterror(f::Symbol, #unused#::Type{Int32}, val::Int64)
@ Core ./boot.jl:614
[2] checked_trunc_sint
@ ./boot.jl:636 [inlined]
[3] toInt32
@ ./boot.jl:673 [inlined]
[4] Int32
@ ./boot.jl:763 [inlined]
[5] convert
@ ./number.jl:7 [inlined]
[6] cconvert
@ ./essentials.jl:412 [inlined]
[7] macro expansion
@ ~/.julia/packages/AMDGPU/f6OQx/src/rand/error.jl:44 [inlined]
[8] rocrand_generate_uniform
@ ~/.julia/packages/AMDGPU/f6OQx/src/rand/librocrand.jl:42 [inlined]
[9] rand!
@ ~/.julia/packages/AMDGPU/f6OQx/src/rand/random.jl:50 [inlined]
[10] rand!
@ ~/.julia/packages/AMDGPU/f6OQx/src/random.jl:43 [inlined]
[11] macro expansion
@ ./timing.jl:242 [inlined]
I'll be happy to help testing on this system.
Seems like we might have to work around this manually by checking if overflow would happen, and if so, we perform multiple calls to the generator. @williamfgc would you be willing to implement this (and add a test)?
@jpsamaroo thanks for your quick response. I might not have enough bandwidth in the next few weeks if the fix requires some dedicated effort. Happy to test if someone merges a fix during that time, though.
perform multiple calls to the generator
By generator you mean the call to the rocrand_generate_uniform here?
By generator you mean the call to the rocrand_generate_uniform here?
No, probably here and in similar locations as necessary. We don't want to make the low-level API more complicated, in case users want to directly access those calls for other purposes.
No rush on this, I also don't have much bandwidth while I resolve other issues plaguing AMDGPU.jl :smile: