DiffEqOperators.jl Performance issues with nonlinear

nonlinear_diffusion! appears to be quite slow and unexpectedly allocates. MWE (including simple reference implementation):

using BenchmarkTools
using DiffEqOperators

const nknots = 100
# grid edges
x = LinRange(0.0,10.0,nknots+1) |> Vector
# grid cell centers
x_c = (x[1:end-1] .+ x[2:end]) ./ 2
# state variable
u = sin.(2π.*x_c)
# uniform grid spacing pretending to be non-uniform ;)
dx = diff(x_c)
# boundary padded dx
dx_pad = [dx[1], dx..., dx[end]]
# boundary padded u
q = [0.0,u...,0.0]
# conductivity (diffusion function)
# note: really this should have length(x) because it is defined on the
# boundaries of the grid cells, not on the centers? but nonlinear_diffusion!
# seems to want it to match the length of q.
k = 0.5.*ones(length(q))
# output
du = similar(u)
@benchmark nonlinear_diffusion!($du,1,1,2,$k,$q,$dx_pad,$nknots)

"""
nonlineardiffusion!(∂y::AbstractVector, x::AbstractVector, Δx::AbstractVector, k::AbstractVector, Δk::AbstractArray)

Second order Laplacian with non-linear diffusion function, k.
"""
function nonlineardiffusion!(∂y::AbstractVector, x::AbstractVector, Δx::AbstractVector, k::AbstractVector, Δk::AbstractArray)
    @inbounds let x₁ = (@view x[1:end-2]),
        x₂ = (@view x[2:end-1]),
        x₃ = (@view x[3:end]),
        k₁ = (@view k[1:end-1]),
        k₂ = (@view k[2:end]),
        Δx₁ = (@view Δx[1:end-1]),
        Δx₂ = (@view Δx[2:end]);
        @. ∂y = (k₂*(x₃ - x₂)/Δx₂ - k₁*(x₂-x₁)/Δx₁)/Δk
    end
end

du2 = similar(u)
k_ = k[1:end-1] # nonlineardiffusion! expects k to be on the boundaries
dk = diff(x)
@benchmark nonlineardiffusion!($du2, $q, $dx_pad, $k_,$ dk)

@assert all(du .≈ du2)

Benchmark result from nonlinear_diffusion!

BechmarkTools.Trial: 6922 samples with 1 evaluations.
 Range (min … max):  556.640 μs …  14.328 ms  ┊ GC (min … max): 0.00% … 93.33%
 Time  (median):     644.482 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   716.439 μs ± 607.819 μs  ┊ GC (mean ± σ):  5.61% ±  6.28%

      ▇ █▁                                                       
  ▁▂▇▃█▇██▆▇▄▆▅▄▅▅▄▃▃▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  557 μs           Histogram: frequency by time         1.21 ms <

 Memory estimate: 273.14 KiB, allocs estimate: 3766.

..and from reference implementation:

BechmarkTools.Trial: 10000 samples with 140 evaluations.
 Range (min … max):  705.121 ns …   4.145 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     730.871 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   789.835 ns ± 164.178 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇▁█▅▄▁    ▂▁▂▁▁▁▁     ▁                                       ▁
  █████████████████████▇████████▇██▇█▇▇█▇▇█▇█▆▆▇▅▆▆▆▆█▇▇▆▅▅▄▄▄▅ █
  705 ns        Histogram: log(frequency) by time       1.37 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

edit: forgot to interpolate the arguments on the reference impl run; but it doesn't seem to make much of a difference

Jul 05 '21 14:07 bgroenks96

@mjsheikh could you take a look?

Jul 05 '21 20:07 ChrisRackauckas

I checked, so the allocations aren't for the output but rather the internally used CenteredDifference. The reference implementation doesn't allocate since here you need first-order derivatives and can directly implement those using FD but for higher orders, the C.D. structures would be required. But yes I agree this is slow. @ChrisRackauckas I wonder if there is any nicer way for computing : (CenteredDifference{N}(first_differential_order,approx_order,dx,nknots)*q) .* (CenteredDifference{N}(second_differential_order,approx_order,dx,nknots)*p)

Jul 06 '21 02:07 mjsheikh

Is there a way to provide CenteredDifference with pre-allocated arrays?

Jul 06 '21 08:07 bgroenks96

Yes it's just the mul! operation.

Jul 06 '21 10:07 ChrisRackauckas

Performance issues with nonlinear_diffusion!