OffsetArrays.jl
OffsetArrays.jl copied to clipboard
Problem with sum over OffsetArray
It's me again. That guy who keeps finding bugs when using sum. In Polyester, in KernelAbstractions, and now in OffsetArrays. Sigh...
Here is a case which slows down a loop by 25x(!) when using a sum over an OffsetArray. Expanding the sum into a for loop restores the normal performance when using Array.
## Set up
# Cartesian step
@inline δ(i, I::CartesianIndex{m}) where{m} = CartesianIndex(ntuple(j -> j==i ? 1 : 0, m))
# Finite difference
@inline ∂(a,I::CartesianIndex{m},u::AbstractArray{T,n}) where {T,n,m} = @inbounds u[I+δ(a,I),a]-u[I,a]
# Divergence using sum function
@fastmath @inline div_sum(I::CartesianIndex{m},u) where {m} = sum(@inbounds ∂(i,I,u) for i ∈ 1:m)
# Divergence expanded using for loop
@fastmath @inline function div_expanded(I::CartesianIndex{m},u) where {m}
init=zero(eltype(u))
for i in 1:m
init += @inbounds ∂(i,I,u)
end
return init
end
# loop over the range R
loop!(div,u,f,R) = @inbounds @simd for I ∈ R
div[I] = f(I,u)
end
## Test case
# Benchmark with Arrays
N = (2^10+2,2^10+2) # size of the array including halo
R = CartesianIndices(map(n->n-2,N)).+CartesianIndex(map(n->1,N)) # range inside the halo
div = zeros(Float32,N);
u = rand(Float32,N...,length(N));
using BenchmarkTools
@btime loop!($div,$u,$div_sum,$R) # 188.200 μs (0 allocations: 0 bytes)
@btime loop!($div,$u,$div_expanded,$R) # 193.400 μs (0 allocations: 0 bytes)
# Benchmark with OA
using OffsetArrays: Origin
divOA = Origin(0)(div);
uOA = Origin((zeros(Int,length(N))...,1))(u);
ROA = CartesianIndices(map(n->n-2,N))
@btime loop!($divOA,$uOA,$div_expanded,$ROA) # 200.000 μs (0 allocations: 0 bytes)
@btime loop!($divOA,$uOA,$div_sum,$ROA) # 4.960 ms (0 allocations: 0 bytes) WTH!?!
Reduced to
julia> @btime mapfoldl(identity, +, (∂(1, I, $uOA) for I in CartesianIndices((1:10, 1:1))));
54.481 ns (0 allocations: 0 bytes)
julia> @btime mapfoldl(identity, +, (∂(1, I, $u) for I in CartesianIndices((1:10, 1:1))));
19.071 ns (0 allocations: 0 bytes)
It seems the iteration is slow
julia> @btime (u -> iterate(∂(1, I, u) for I in CartesianIndices((1:10, 1:1))))($uOA);
12.965 ns (0 allocations: 0 bytes)
julia> @btime (u -> iterate(∂(1, I, u) for I in CartesianIndices((1:10, 1:1))))($u);
3.365 ns (0 allocations: 0 bytes)
Oddly,
julia> @btime ∂(1, CartesianIndex(1,1), $uOA);
3.381 ns (0 allocations: 0 bytes)
julia> @btime (I -> ∂(1, I, $uOA))(CartesianIndex(1,1));
7.636 ns (0 allocations: 0 bytes)