`Attempt to copy a freed reference` when creating a second view into the same array
I randomly see this error popping up on the BuildKite CI: https://buildkite.com/julialang/acceleratedkernels-dot-jl/builds/14#01932a74-e21d-408d-86d0-81bf259f6bce/241-416
It happens when creating two exclusive views into the same array, immediately one after the other:
p1 = @view dst[1:blocks]
p2 = @view dst[blocks + 1:end]
When it happens, I see the following common contexts:
- It only happened on Julia 1.10, never on 1.11, but that may just be chance.
- It only happened when creating the second view,
p2. - It only happened in this specific kernel, even though views are used elsewhere in the codebase too.
- It only happened on the oneAPI backends; the CUDA, AMDGPU and Metal ones never showed this error.
That part of the code is sequential - on the same, main thread/task. I was not able to reproduce it locally. Would you have any pointers on how to investigate this?
Related to https://github.com/JuliaGPU/oneAPI.jl/issues/439 / https://github.com/JuliaGPU/oneAPI.jl/pull/459?
No, eviction is a Level Zero-specific operation, while the error here is triggered by GPUArrays' refcounting mechanism.
We are also getting this "random" error:
https://buildkite.com/julialang/komamri-dot-jl/builds/1645#019858dc-4857-4493-a4e2-ab059f2f8249/512-751.
In our case it also occurs with Julia 1.10, by doing two consecutive views to the same array
...
displacement_x!(@view(ux[idx, :]), m.action, @view(x[idx]), @view(y[idx]), @view(z[idx]), t_unit)
displacement_y!(@view(uy[idx, :]), m.action, @view(x[idx]), @view(y[idx]), @view(z[idx]), t_unit) # <--- ERROR
displacement_z!(@view(uz[idx, :]), m.action, @view(x[idx]), @view(y[idx]), @view(z[idx]), t_unit)
...
For now, we are just re-running the tests if it fails.