ParallelStencil.jl icon indicating copy to clipboard operation
ParallelStencil.jl copied to clipboard

How to implement custom finite differencing operators

Open TakeTwiceDailey opened this issue 2 years ago • 8 comments

I am interested in using this package to multithread some hyperbolic PDEs starting from the example of the 1D acoustic wave equation. The first step is to figure out how to implement my own finite difference operators (I use summation by parts operators that are not diagonal for example)

The documentation states that "Custom macros to extend the finite differences submodules or for other stencil-based numerical methods can be readily plugged in"

It is not obvious to me how to do this. Say I have a non-diagonal but sparse matrix D that I want to apply to the current state vector, how can I write a macro to do this?

TakeTwiceDailey avatar May 09 '23 20:05 TakeTwiceDailey

@TakeTwiceDailey : thank you for your interest and sorry for the delay!

You need to express your computations with stencils, meaning you need to define computations on a cell-level. Have a look at the FiniteDifferences1D sub module: https://github.com/omlins/ParallelStencil.jl/blob/main/src/FiniteDifferences.jl#L1-L69

You can create macros like these or you can create your whole custom submodule, which you load instead or in addition to the FiniteDifferences submodule.

BTW: Please ask questions in the julia discourse; issues are for problems in the package or feature requests, see: https://github.com/omlins/ParallelStencil.jl#questions-comments-and-discussions

omlins avatar May 15 '23 18:05 omlins

@omlins : Thanks for the reply. I have joined the Julia Slack, so I will be sure to use it in the future.

I just don't understand how to do a cell-level program without being inefficient. The finite differencing operators I use only use points within the domain, they don't bleed into "ghost points" like it seems the wave miniapp does, so the macro needs to be aware of where the bounds are.

Take my first attempt as an example:

macro     d_x(A::Symbol)  
    esc(:( 
        if $ix == 1
            (-$A[$ix,$iy  ] + $A[$ix+1  ,$iy ]) 
        elseif $ix == size($A,1)
            ( $A[$ix,$iy  ] - $A[$ix-1  ,$iy ])
        else
            ($A[$ix+1,$iy  ] - $A[$ix-1  ,$iy ])/2 
        end
        )) 
end

This works, but I fear the constant index value checking will be inefficient. Is there some best practices with writing operators like this to work well with ParallelStencil.jl?

TakeTwiceDailey avatar May 15 '23 19:05 TakeTwiceDailey

@TakeTwiceDailey : You don't need to include the boundary handling in these macros. Just write them for the inner points. Then for the boundaries you can write separate kernels as you find in the following example:

https://github.com/omlins/ParallelStencil.jl/blob/main/miniapps/Stokes2D.jl#L115-L116

omlins avatar May 16 '23 18:05 omlins

I can't even figure out how to properly select the interior points.

Following this example:

import ParallelStencil.INDICES
const ix, iy = INDICES[1], INDICES[2]
ixi, iyi = :($ix+1), :($iy+1)
macro inn(A::Symbol)  esc(:( $A[$ixi ,$iyi ] )) end

I have absolutely no idea how or why ixi, iyi = :($ix+1), :($iy+1) manages to select A[2:end-1,2:end-1], as I would have expected A[2:end, 2:end]. Doing instead:

import ParallelStencil.INDICES
const ix, iy = INDICES[1], INDICES[2]
ixi, iyi = :($ix+2), :($iy+2)
macro inn(A::Symbol)  esc(:( $A[$ixi ,$iyi ] )) end

Acts as I expect, as it apparently selects A[3:end,3:end]. Ultimately I want to be able to select A[n:end-n,n:end-n] or any n (so I can write an arbitrarily accurate finite differencing stencil in the interior without selecting points outside the domain), but no amount of fiddling has gotten me close to this without writing explicit if statements which I would think I should avoid.

TakeTwiceDailey avatar May 16 '23 21:05 TakeTwiceDailey

In addition to defining the shift of the range start as you did, you also need to define the @within macro - also in analogy with the example: https://github.com/omlins/ParallelStencil.jl/blob/main/src/FiniteDifferences.jl#L61-L67

Or rather for 2D, it will be more similar to this: https://github.com/omlins/ParallelStencil.jl/blob/main/src/FiniteDifferences.jl#L176-L184

=> Replace -2 with -4 in your @within macro.

omlins avatar May 17 '23 17:05 omlins

Thanks for your help @omlins

Here is what I have settled on (for those who may want to do that same thing as me), please let me know if you think it can be done better somehow. I disobeyed your advice to code the boundary regions into separate kernels, as I really don't want to have to write separate equations of motion just for those regions.

T = Float64
const ix, iy = INDICES[1], INDICES[2]
const b = 7

ixi, iyi = :($ix+b), :($iy+b)

macro inn(A::Symbol)  esc(:( $A[$ixi ,$iyi ] )) end

const a1 = T(-1/60.)
const a2 = T(9/60.)
const a3 = T(-45/60.)
const a4 = T(45/60.)
const a5 = T(-9/60.)
const a6 = T(1/60.)

macro     D_x(A::Symbol,ql::Symbol,qr::Symbol)  
    esc(:(
    n=size($A,1);i=$ix;j=$iy;
    if i in 7:n-6
        (a1*$A[i-3,j]+a2*$A[i-2,j]+a3*$A[i-1,j]+a4*$A[i+1,j]+a5*$A[i+2,j]+a6*$A[i+3,j]) 
    elseif  i<b
         ($ql[i,1]*$A[ 1 ,j] + $ql[i,2]*$A[ 2 ,j] + $ql[i,3]*$A[ 3 ,j]
        + $ql[i,4]*$A[ 4 ,j] + $ql[i,5]*$A[ 5 ,j] + $ql[i,6]*$A[ 6 ,j]
        + $ql[i,7]*$A[ 7 ,j] + $ql[i,8]*$A[ 8 ,j] + $ql[i,9]*$A[ 9 ,j])
    else 
        k=i-(n-b+1);
         ($qr[k,1]*$A[n-8,j] + $qr[k,2]*$A[n-7,j] + $qr[k,3]*$A[n-6,j]
        + $qr[k,4]*$A[n-5,j] + $qr[k,5]*$A[n-4,j] + $qr[k,6]*$A[n-3,j]
        + $qr[k,7]*$A[n-2,j] + $qr[k,8]*$A[n-1,j] + $qr[k,9]*$A[ n ,j])
    end
    )) 
end

macro     d_y(A::Symbol)  
    esc(:(
        (  -0.5*$A[$ix,$iyi-1] + 0.5*$A[$ix,$iyi+1] )
        )) 
end

macro within(macroname::String, A::Symbol)
    if     macroname == "@all"    esc( :($ix<=size($A,1)     && $iy<=size($A,2)     ) )
    elseif macroname == "@inn"    esc( :($ix<=size($A,1)-2*b && $iy<=size($A,2)-2*b ) )
    elseif macroname == "@D_x"    esc( :($ix<=size($A,1)     && $iy<=size($A,2)     ) )
    elseif macroname == "@d_y"    esc( :($ix<=size($A,1)     && $iy<=size($A,2)-2*b ) )
    else error("unkown macroname: $macroname. If you want to add your own assignment macros, 
        overwrite the macro 'within(macroname::String, A::Symbol)'; 
        to still use the exising macro within as well call 
        ParallelStencil.FiniteDifferences{1|2|3}D.@within(macroname, A) at the end.")
    end
end

I was worried earlier in this thread that index value checking would be slow, but as I have come to find, this simply doesn't matter. This package just seems to be completely memory bandwidth limited (as discussed in the docs). I benchmarked both @D_x and @d_y here, which should be very different computationally, but they take the same amount of time to evaluate regardless of resolution.

At least on the GPU. If I change to the CPU, I get some weird behaviour where it seems the qr matrix gets randomly garbled and it gives incorrect results for the @D_x operator. The ql region is fine though. Does this have something to do with shared memory between threads?

TakeTwiceDailey avatar May 23 '23 14:05 TakeTwiceDailey

Just two share my two cents, I usually find that it's better (or in the worst case, equally performant) to use closures than macros inside @parallel kernels -and this results in better-looking code. You could instead do something like:


Base.@propagate_inbounds @inline function D_x_fun(A, ql, qr)  
    n=size(A,1)
    if i in 7:n-6
        (a1*A[i-3,j]+a2*A[i-2,j]+a3*A[i-1,j]+a4*A[i+1,j]+a5*A[i+2,j]+a6*A[i+3,j]) 
    elseif  i<b
         (ql[i,1]*A[ 1 ,j] + ql[i,2]*A[ 2 ,j] + ql[i,3]*A[ 3 ,j]
        + ql[i,4]*A[ 4 ,j] + ql[i,5]*A[ 5 ,j] + ql[i,6]*A[ 6 ,j]
        + ql[i,7]*A[ 7 ,j] + ql[i,8]*A[ 8 ,j] + ql[i,9]*A[ 9 ,j])
    else 
        k=i-(n-b+1);
         (qr[k,1]*A[n-8,j] + qr[k,2]*A[n-7,j] + qr[k,3]*A[n-6,j]
        + qr[k,4]*A[n-5,j] + qr[k,5]*A[n-4,j] + qr[k,6]*A[n-3,j]
        + qr[k,7]*A[n-2,j] + qr[k,8]*A[n-1,j] + qr[k,9]*A[ n ,j])
    end
end

@parallel_indices (i,j) function foo(A, ql, qr)
    D_x() = (i, j) ->  D_x_fun(A, ql, qr)  # define closure

    @inbounds D_x() # plus  whatever extra code here....
    return nothing
end

albert-de-montserrat avatar May 24 '23 05:05 albert-de-montserrat

macro within(macroname::String, A::Symbol) if macroname == "@all" esc( :($ix<=size($A,1) && $iy<=size($A,2) ) ) elseif macroname == "@inn" esc( :($ix<=size($A,1)-2b && $iy<=size($A,2)-2b ) ) elseif macroname == "@D_x" esc( :($ix<=size($A,1) && $iy<=size($A,2) ) ) elseif macroname == "@d_y" esc( :($ix<=size($A,1) && $iy<=size($A,2)-2*b ) )

@TakeTwiceDailey You only need to have in here the macros that are used at the left side of equal signs, which i think here would be just the first two.

Does this have something to do with shared memory between threads?

It should not. Does the problem occur also if you use only one thread?

omlins avatar May 31 '23 12:05 omlins