Unexpectedly many allocations for in place operations
Firstly, thank you. I'm really enjoying the SciML ecosystem.
I'm noticing an unexpectedly high number of allocations when applying operators in place. Here's a minimal example:
d2x = CenteredDifference{1}(2, 2, 1.0, 5)
bcx = Neumann0BC(1.0, 1)
op = d2x * bcx
u = zeros(5,5)
u[3,3] = 1.0
@btime du = op * u # 99.501 μs (1155 allocations: 95.17 KiB)
du = zeros(5,5)
@btime mul!(du, op, u) # 97.665 μs (1154 allocations: 94.89 KiB)
I was expecting a bigger reduction in allocations during the in-place application.
yes, that looks bad.
I confirm this problem is still there. And it's very strange that it takes such long time for such small, 5 * 5, matrix. And surprisingly, when you do this operation on only one column, the difference is so dramatic. I used a 100*100 matrix.
julia> @btime du = op * u[:,1];
733.333 ns (5 allocations: 1.95 KiB)
julia> @btime du = op * u;
23.092 ms (481702 allocations: 31.38 MiB)
there is one allocation can be reduced by:
julia> dux = @view du[:,1]
julia> @btime mul!(dux, op, u[:,1])
721.519 ns (4 allocations: 1.08 KiB)
Then I define
function f1!(du, u)
m,n = size(u)
@assert (m, n) == size(du)
for i = 1:n
du[:, i] = op * u[:,i]
end
end
This is much faster than do it directly.
julia> @benchmark f1!(du, u)
BenchmarkTools.Trial:
memory estimate: 195.31 KiB
allocs estimate: 500
--------------
minimum time: 63.400 μs (0.00% GC)
median time: 137.200 μs (0.00% GC)
mean time: 1.051 ms (87.11% GC)
maximum time: 680.927 ms (99.98% GC)
--------------
samples: 5230
evals/sample: 1
julia> @benchmark mul!(du, op, u)
BenchmarkTools.Trial:
memory estimate: 31.30 MiB
allocs estimate: 481700
--------------
minimum time: 22.251 ms (0.00% GC)
median time: 28.158 ms (0.00% GC)
mean time: 33.626 ms (17.35% GC)
maximum time: 70.771 ms (0.00% GC)
--------------
samples: 149
evals/sample: 1
But it seems not possible to define mul!(du[:,i], op, u[:,i]) inside the function. So i have to define something like :
function f2!(du, u)
m,n = size(u)
@assert (m, n) == size(du)
dui = zeros(m)
for i = 1:n
mul!(dui, op, u[:,i])
@. du[:, i] = dui
end
end
and results in some improvement:
julia> @benchmark f2!(du, u)
BenchmarkTools.Trial:
memory estimate: 107.12 KiB
allocs estimate: 301
--------------
minimum time: 48.500 μs (0.00% GC)
median time: 80.600 μs (0.00% GC)
mean time: 440.562 μs (81.42% GC)
maximum time: 421.137 ms (99.97% GC)
--------------
samples: 10000
evals/sample: 1
It may be a threading performance issue. We may need to use @turbo or whatnot.
It may be a threading performance issue. We may need to use
@turboor whatnot.
Hi Chris, did you notice the difference in allocations between operating the whole matrix or only one column?